December 23, 2007

Text mining – fact and fiction

Text mining is science-project artificial intelligence. Fiction. Text mining is proven in many practical applications.

To implement text mining, you need computational linguists. Fact. Monash’s Second Law of Commercial Semantics states “Where there are ontologies, there is consulting.” And it’s linguists, or reasonable facsimiles of same, who do the consulting.

To use text mining, you need computational linguists. Fiction. When last I counted, the number of known computational linguists working for end-user organizations, worldwide, was precisely 1, at Procter & Gamble. (Intelligence agencies excepted, of course.) I’d guess it’s higher now, but I probably could still count them all without taking my socks off.

CRM applications are driving the growth of text mining. Fact. Most current growth in text mining seems to come from Voice of the Customer and Voice of the Market/competitive intelligence applications. And a couple of years ago, when SAS and SPSS had a joint boom in text mining, a lot of that was coming from CRM.

Text mining products are useful mainly for large enterprises. More fact than fiction. Text mining makes the most sense when you have too much text for humans to read and summarize.

Text mining doesn’t fit well with relational databases. Fiction. The fastest-growing text mining companies seem to be Attensity and Clarabridge, who consistently extract textual information into relational databases.

Text mining imposes structure on unstructured* data. More fact than fiction. Most text mining applications involve examining free-text documents and creating entries in relational or XML databases. Most people would call that a transition from unstructured to structured form.

*I still don’t like the “structured/unstructured” distinction, but with repetition I’m getting somewhat inured to it.

Enterprise search is an alternative to text mining. Fact. You can use a high-end search engine to cluster documents and look for trends and insight. It’s not the real McCoy, but in some cases it gives you 80% of the benefit of the real thing.

Text mining is an ingredient, not a product category. Part fact, part fiction. The biggest text mining efforts in the world are probably at Google, Yahoo, Microsoft search, and Dow Jones/Factiva. Antispam vendors also invest a lot in text mining. Two of the top five independent text mining vendors were acquired this year (ClearForest and Inxight). And of the many dozens of small text mining independents, most are focused on specific niches.

Even so, Attensity, Clarabridge, and Temis show that, at least for now, text mining remains a legitimate product category.

The text mining industry is in trouble. Part fact, part fiction. As I recently ranted, even the leading text mining vendors are letting many opportunities pass them by. And like many software sectors, text mining seems poised to be absorbed via large-company acquisition. SAP has already secured a text mining business via BOBJ/Inxight, but at least one vendor each could easily be bought by Oracle, Microsoft (despite the in-house expertise from its search arm), and IBM (despite or even in connection with UIMA).

But in the meantime, a few small text mining vendors are still showing rapid growth.

Previous “fact and fiction” post: Data warehouse appliances.

Related links

December 19, 2007

Scout Labs – yet more public-facing sentiment analysis

Scout Labs sounds like even more of what I was thinking of than Summize. It’s a shame that the “traditional” text mining vendors didn’t get there first.

December 18, 2007

The text mining vendors continue to lack constructive vision

I’ve been thinking for a long time that the various text mining companies doing sentiment analysis should try some public-facing (or at least multi-customer) services. Investors might love such a thing. So might marketing managers (actually, Factiva claims to be active there, at least as per their web site). And as a key part of the strategy, text mining companies selling to enterprises might brand such a site and gain massive awareness accordingly. Well, it seems that public-facing sentiment analysis sites are springing up. At least, Summize has. (Hat tip to TechCrunch.) And the text mining vendors are nowhere to be seen.

So what else is new? Read more

December 12, 2007

Attivio tries to do it all

When Andrew McKay was at FAST, I grumped about his search/BI integration story. Now that he’s trying to do the same thing at a startup called Attivio, it sounds more plausible.

Attivio is having a house party and product rollout in the latter part of January, and details are scarce in the mean time. But here are some highlights.

December 11, 2007

Thoughts from an overview of technology marketing

As part of the Monash Advantage program, I published a proprietary Monash Letter about online marketing … and another one … and some further stuff so proprietary I’m not even putting out teasers about it. Now I’ve taken the next step, and written another Letter with a complete overview of software-centric marketing strategy and tactics (lead generation aside). That’s proprietary too, and only available in full if you have access to the secure Monash Advantage website, but here are some semi-random highlights for public consumption. Read more

December 9, 2007

Russian chatbot apparently passes Turing test

Ina Fried reports of a Russian chatbot that sure sounds like it passes the Turing test. To wit (emphasis mine):

A program that can mimic online flirtation and then extract personal information from its unsuspecting conversation partners is making the rounds in Russian chat forums, according to security software firm PC Tools.

The artificial intelligence of CyberLover’s automated chats is good enough that victims have a tough time distinguishing the “bot” from a real potential suitor, PC Tools said. The software can work quickly too, establishing up to 10 relationships in 30 minutes, PC Tools said.

That said, threat reports from PC security companies are notoriously hyped, so I wouldn’t get too excited until there’s stronger confirmation. Read more

December 8, 2007

Windows Live search is rather different from MSN

Until the middle of this year, I got negligible search engine traffic from either MSN or Yahoo, or indeed any other search engine except Google. We’re literally talking a 90-95% share for Google, on each of my three main blogs, most months.

But in November, the Windows Live share was 19% on DBMS2, 29% on Text Technologies, and 41% on the Monash Report. And those aren’t blips; in each case there was steady August-November monthly growth. But on the other hand, early December month-to-date figures are all back down. Weird. Read more

December 7, 2007

QL2 – web text extraction and more

Here are some highlights of the QL2 story, per exec Mike McDermott.

Read more

December 2, 2007

Danny Sullivan thinks blended vertical search is a game-changer

Danny Sullivan thinks blended vertical search — which he’s calling Search 3.0 — is a game changer. (In this context, “vertical” search denotes alternate result types such as video, image, map coordinates, or product listings.) In saying that, he’s focused on search marketers, who now have a lot more ways to try to get their messages onto Google searchers’ top result pages. But I presume what he’s really saying is that there will be a feedback effect — if Google tells all web searchers about videos and product listings, then internet marketers will be more motivated to post videos and product listings, and hence there will be more interesting choices of videos and product listings — which Google will naturally wind up featuring more prominently in its search results. And so on.

Given the Youtube explosion, I find it hard to argue with his claim.

December 2, 2007

So what’s the state of speech recognition and dictation software?

Linda asked me about the state of desktop dictation technology. In particular, she asked me whether there was much difference between the latest version and earlier, cheaper ones. My knowledge of the area is out of date, so I thought I’d throw both the specific question and the broader subject of speech recognition out there for general discussion.

Here’s much of what I know or believe about speech recognition:

← Previous PageNext Page →

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.