Text Analytics Summit marketing panel: Membership firmed up
We’ve now solidified the membership of the Text Analytics Summit marketing panel. It is:
- Curt Monash, President, Monash Information Services
- Dave Kellogg, CEO, Mark Logic Corporation
- Michelle De Haaff, VP Marketing, Attensity Corporation
- Michel Lemay, VP Marketing, nstein Technologies
- Mary Crissey, SAS Analytics Marketing Manager, SAS Institute
Michelle, Michel, and Mary are all obvious choices, responsible for marketing at leading text mining vendors. In addition, Mary has excelled on the same panel in the past, Michel sent me e-mail with some brilliant thoughts on the panel subject, and Attensity has one of the most interesting strategies in the text analytics market.
As for Dave — he’s simply one of the most astute marketing theorists working in software today. And he runs a very interesting text technology company. And he used to be most senior marketing guy in all of business intelligence, when he was SVP at Business Objects. In his copious free time, he writes a really cool blog.
Categories: Attensity, Mark Logic, nStein, SAS, Text Analytics Summit, Text mining | 3 Comments |
What’s going on at ClearForest?
I tried to invite Jay Henderson so speak on the Text Analytics Summit marketing panel, but got no answer to my e-mail. The company phone directory didn’t work so well for him either. I sent e-mail to a general PR company e-mail address, and that didn’t get returned. And Ravi tells me he has had similar difficulties reaching them. Read more
Categories: ClearForest/Reuters, Text mining | 5 Comments |
SAP’s “search” strategy isn’t about search
I caught up with Dennis Moore today to talk about SAP’s search strategy. And the biggest thing I learned was – it’s not about the search. Rather, it’s about a general interface, of which search and natural language just happen to be major parts.
Dennis didn’t actually give me a lot of details, at least not ones he’s eager to see published at this time. That said, SAP has long had a bare-bones search engine TREX. (TREX was also adapted to create the columnar relational data manager BI Accelerator.) But we didn’t talk about TREX enhancements at all, and I’m guessing there haven’t really been many. Rather, SAP’s focus seems to be on:
A. Finding business objects.
B. Helping users do things with them.
Categories: BI integration, Enterprise search, Language recognition, Natural language processing (NLP), SAP, Search engines | 2 Comments |
Has Google hit 10 petabytes yet?
I’ve been musing about how big Google’s core database might be. Figuring that out is not a trivial problem, unless they’ve published the answer somewhere that I’m not aware of. But here’s a big clue, from an announcement about their n-gram data:
We processed 1,024,908,267,229 words of running text
Categories: Google, Search engines | Leave a Comment |
InQuira’s and Mercado’s approaches to structured search
InQuira and Mercado both have broadened their marketing pitches beyond their traditional specialties of structured search for e-commerce. Even so, it’s well worth talking about those search technologies, which offer features and precision that you just don’t get from generic search engines. There’s a lot going on in these rather cool products.
In broad outline, Mercado and InQuira each combine three basic search approaches:
- Generic text indexing.
- Augmentation via an ontology.
- A rules engine that helps the site owner determine which results and responses are shown under various circumstances.
Of the two, InQuira seems to have the more sophisticated ontology. Indeed, the not-wholly-absurd claim is that InQuira does natural-language processing (NLP). Both vendors incorporate user information in deciding which search results to show, in ways that may be harbingers of what generic search engines like Google and Yahoo will do down the road. Read more
Categories: InQuira, Mercado, Natural language processing (NLP), Ontologies, Search engines, Structured search | 2 Comments |
Is DMOZ the cure to Wikipedia’s spam problem?
Joost de Valk makes an interesting suggestion, namely that Wikipedia should drop all external links other than to DMOZ, and rely on DMOZ as the outside link directory. As division of labor, it makes perfect sense. However, it’s a total non-starter until at least two problems are solved. Read more
Categories: Categorization and filtering, Directories, ODP and DMOZ, Ontologies, Spam and antispam | 5 Comments |
Fact and Fiction: DMOZ and the ODP
- DMOZ is dead. Fiction!
- New site submissions are being processed. Partial fact.
- Pending site submissions were lost in the outage. Partial fact.
- Other non-public ODP data was lost in the outage too. Partial fact.
- New editor applications aren’t being processed yet. Fact.
- ODP editors are corrupt. Fiction!
- The ODP is secretive and deceptive. Largely fiction.
- If a DMOZ category doesn’t have a listed editor, it’s unlikely to get much attention. Part fact, part fiction.
- ODP editors hate search engine optimization. Partial fact.
- ODP editors hate SEOs. Partial fact.
I shall explain. Read more
Categories: Categorization and filtering, Directories, ODP and DMOZ, Search engine optimization (SEO) | 7 Comments |
A hobbit writes from the ODP Entmoot
Before saying anything about the Open Directory Project or the DMOZ directory it produces, I should offer several disclaimers.
- No editor speaks for the ODP, let alone for Time Warner/AOL/Netscape.
- No single editor’s opinions or choices control any edits in DMOZ, even if s/he is the sole listed editor of a category. Any of us can be overruled on any editing decision at any time.
- I’m effectively as new as they come, or at least was at the time DMOZ editing came back online (late December). There have been no new editors since the well-publicized outage, and I had next to no involvement with the project prior to the outage.
- Notwithstanding point #2, I’m quite opinionated, which I’m sure surprises approximately nobody. And my opinions quite often are different from those of the ODP mainstream.
Categories: Categorization and filtering, ODP and DMOZ | 1 Comment |
Can Hakia hack it?
Hakia purports to be a new search engine that indexes “semantically,” which I presume means on phrases or concepts or something. But I’ve run a few queries side by side on Hakia and Google, and they’re not doing well. I think they’re not making sufficiently good use of page reputation. Try “web hosting forum” for an example of this, looking at the top two hits in both cases.
When I queried on “Viagra,” Hakia did — as it were — outperform Google. But that’s the only case I, uh, came up with. On less snigger-worthy searches, Google seemed to do as well as or better than Hakia.
Categories: Google, Search engines | Comments Off on Can Hakia hack it? |
What’s interesting about the FAST venture in BI
FAST is annoying me a bit these days. It’s nothing serious, but travel schedule screw-up’s, an annoying embargo, and a screw-up in the annoying embargo have all hit at once. So I’ll keep this telegraphic and move on to other subjects.
- They’re doing fast queries without using a lot of RAM.
- They’re doing the usual text search thing of indexing across multiple “databases,” only now it’s applied to, well, databases. (Not that there’s much new about that particular aspect. Actually, there seems to be a bit of kludge in that they export the databases to some kind of simple text files.)
- They’re doing some level of concept identification ala the text mining guys. (They don’t call it “entity extraction” because the results aren’t dumped into a database anywhere, but instead are just used on the fly.) Of course, the text mining/search convergence goes both ways.
- They bought a BI/dashboard tool and are using it both to analyze query logs and also to do normal BI/dashboard kinds of things.
- They have big references for this stuff, at least the single-web-site query aspect. Well, actually, the customer names are confidential. Oh well.
And as another example of how this wasn’t the smoothest PR month for FAST, Steve Arnold somehow got the false idea that they were getting out of true text search altogether.
Categories: BI integration, Enterprise search, FAST, Search engines, Text mining | 3 Comments |