Text mining
Analysis of text mining companies, technology, and trends. Related subjects include:
Clarabridge is now all about text mining SaaS
Clarabridge CEO Sid Banerjee called with some product news that is embargoed until the Text Analytics Summit, and which I hence won’t write about at this time. But during the call, I discovered something interesting – Clarabridge’s hosted/SaaS (Software as a Service) text mining offering has taken over its business. Highlights of the call included: Read more
Categories: Clarabridge, Software as a Service (SaaS), Text mining, Text mining SaaS | 2 Comments |
Text Analytics Summit and associated Seth Grimes white paper
Ironically coming right after a Google indexing problem, I am putting up my first sponsored blog post ever. It’s in connection with the forthcoming Text Analytics Summit, at which I will be speaking (in Boston) on June 16. The post itself offers a free white paper by the estimable Seth Grimes.
Categories: Text Analytics Summit, Text mining | 1 Comment |
Investment text mining job listing
As per this job listing, at least one “major NYC investment bank” plans to do text mining on a proprietary trading desk.
The successful candidate will mine text data from numerous news sources and incorporate the information the proprietary trading systems.
The biggest text analytics company you probably never heard of
I caught up with Expert System S.p.A. last week. They turn out to be doing $10 million in text technology annual revenue. That alone is surprising (sadly), but what’s really remarkable is that they did it almost entirely in the Italian market. As you might guess, that figure includes a little bit of everything, from search engines to Italian language filters for Microsoft Office to text mining. But only $3 ½ million of Expert System’s revenue is from the government (and I think that includes civilian agencies), and under 30% is professional services, so on the whole it seems like a pretty real accomplishment. Oh yes – Expert Systems says it’s entirely self-funded.
As of last year, Expert System also has English-language products, and a couple of minor OEM sales in the US (for mobile search and semantic web applications). German- and Arabic-language products are in beta test. The company says that its market focus going forward is national security – surely the reason for the Arabic – and competitive intelligence. It envisions selling through partners such as system integrators, although I think that makes more sense for the government market than it does vis-a-vis civilian companies. In February the company is introducing a market intelligence product focused on sentiment analysis.
Expert System is a bit of a throwback, in that it talks lovingly of the semantic network that informs its products. Read more
Categories: Application areas, Competitive intelligence, Enterprise search, Expert System S.p.A., Ontologies, Search engines, Text mining | Leave a Comment |
Text mining – fact and fiction
Text mining is science-project artificial intelligence. Fiction. Text mining is proven in many practical applications.
To implement text mining, you need computational linguists. Fact. Monash’s Second Law of Commercial Semantics states “Where there are ontologies, there is consulting.” And it’s linguists, or reasonable facsimiles of same, who do the consulting.
To use text mining, you need computational linguists. Fiction. When last I counted, the number of known computational linguists working for end-user organizations, worldwide, was precisely 1, at Procter & Gamble. (Intelligence agencies excepted, of course.) I’d guess it’s higher now, but I probably could still count them all without taking my socks off.
CRM applications are driving the growth of text mining. Fact. Most current growth in text mining seems to come from Voice of the Customer and Voice of the Market/competitive intelligence applications. And a couple of years ago, when SAS and SPSS had a joint boom in text mining, a lot of that was coming from CRM.
Text mining products are useful mainly for large enterprises. More fact than fiction. Text mining makes the most sense when you have too much text for humans to read and summarize.
Text mining doesn’t fit well with relational databases. Fiction. The fastest-growing text mining companies seem to be Attensity and Clarabridge, who consistently extract textual information into relational databases.
Text mining imposes structure on unstructured* data. More fact than fiction. Most text mining applications involve examining free-text documents and creating entries in relational or XML databases. Most people would call that a transition from unstructured to structured form.
*I still don’t like the “structured/unstructured” distinction, but with repetition I’m getting somewhat inured to it.
Enterprise search is an alternative to text mining. Fact. You can use a high-end search engine to cluster documents and look for trends and insight. It’s not the real McCoy, but in some cases it gives you 80% of the benefit of the real thing.
Text mining is an ingredient, not a product category. Part fact, part fiction. The biggest text mining efforts in the world are probably at Google, Yahoo, Microsoft search, and Dow Jones/Factiva. Antispam vendors also invest a lot in text mining. Two of the top five independent text mining vendors were acquired this year (ClearForest and Inxight). And of the many dozens of small text mining independents, most are focused on specific niches.
Even so, Attensity, Clarabridge, and Temis show that, at least for now, text mining remains a legitimate product category.
The text mining industry is in trouble. Part fact, part fiction. As I recently ranted, even the leading text mining vendors are letting many opportunities pass them by. And like many software sectors, text mining seems poised to be absorbed via large-company acquisition. SAP has already secured a text mining business via BOBJ/Inxight, but at least one vendor each could easily be bought by Oracle, Microsoft (despite the in-house expertise from its search arm), and IBM (despite or even in connection with UIMA).
But in the meantime, a few small text mining vendors are still showing rapid growth.
Previous “fact and fiction” post: Data warehouse appliances.
Related links
Categories: Text mining | 5 Comments |
Scout Labs – yet more public-facing sentiment analysis
Scout Labs sounds like even more of what I was thinking of than Summize. It’s a shame that the “traditional” text mining vendors didn’t get there first.
Categories: Competitive intelligence, Text mining | 2 Comments |
The text mining vendors continue to lack constructive vision
I’ve been thinking for a long time that the various text mining companies doing sentiment analysis should try some public-facing (or at least multi-customer) services. Investors might love such a thing. So might marketing managers (actually, Factiva claims to be active there, at least as per their web site). And as a key part of the strategy, text mining companies selling to enterprises might brand such a site and gain massive awareness accordingly. Well, it seems that public-facing sentiment analysis sites are springing up. At least, Summize has. (Hat tip to TechCrunch.) And the text mining vendors are nowhere to be seen.
So what else is new? Read more
Categories: Application areas, Factiva/Dow Jones, Investment research and trading, Text mining | 1 Comment |
QL2 – web text extraction and more
Here are some highlights of the QL2 story, per exec Mike McDermott.
- QL2’s main business is scraping price and other product offering data from the web for high-speed competitive analysis. For example, of their 250ish customers overall, over 90 are airlines. Online retailers are another big chunk of their customer base.
- QL2 also commonly partners with text mining companies in applications such as Voice of the Market or competitive intelligence. E.g., QL2 has been brought into a few deals each by Attensity, Clarabridge, and especially Temis.
- QL2 goes well beyond basic crawling. Notably, the system fills in forms with parameters. And of course it monitors pages for changes.
- QL2’s scripting language is, Mike tells me, very SQL-like. Hence the “QL” in the name.
- QL2 rolls its own filters, rather than using INSO or whoever. (Actually, what are the main file-reading filter choices these days? I’ve lost track.) Indeed, Mike fondly believes QL2 does a better job with PDFs than Adobe does.
- QL2 doesn’t want to be thought of as web-only. Rather, Mike likes my formulation of “text data ETL, web or otherwise.” That said, he freely admits QL2’s strength is in Extract rather than in Transform or Load.
Categories: Application areas, Competitive intelligence, QL2, Text mining | Leave a Comment |
Clarabridge does SaaS, sees Inxight
I just had a quick chat with text mining vendor Clarabridge’s CEO Sid Banerjee. Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question. Attensity is unsurprisingly #1. What’s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has begun to show up a bit this quarter, via the Business Objects sales force. Sid was of course dismissive of their current level of technological readiness and integration – but at least BOBJ/Inxight is showing up now.
The most interesting point was text mining SaaS (Software as a Service). When Clarabridge first put out its “We offer SaaS now!” announcement, I yawned. But Sid tells me that about half of Clarabridge’s deals now are actually SaaS. The way the SaaS technology works is pretty simple. The customer gathers together text into a staging database – typically daily or weekly – and it gets sucked into a Clarabridge-managed Clarabridge installation in some high-end SaaS data center. If there’s a desire to join the results of the text analysis with some tabular data from the client’s data warehouse, the needed columns get sent over as well. And then Clarabridge does its thing. Read more
Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, IBM and UIMA, Software as a Service (SaaS), Text mining, Text mining SaaS | 1 Comment |
What TEMIS is seeing in the marketplace
CEO Eric Bregand of Temis recently checked in by email with an update on text mining market activity. Highlights of Eric’s views include:
- Yep, Voice Of The Customer is hot, in “many markets”; Eric specifically mentioned banking, car, energy, food, and retail. He further sees IBM backing VotC as text’s “killer app.” (Note: Temis has a history of partnering with IBM, most notably via its unusually strong commitment to UIMA.)
- Specifically, THE hot topics in the European market these days are competitive intelligence and sentiment analysis. (Note: I’ve always thought Temis got serious about competitive analysis a little earlier than most other text mining vendors did.)
- Life sciences is an ever growing focus for Temis.
- I confused him a bit with how I phrased my question about custom publishing and Temis’ Mark Logic partnership. But he did express favorable views of the market, specifically in the area of integrating text mining and native XML database management, and even volunteered that nStein appears to be doing well.