Text mining
Analysis of text mining companies, technology, and trends. Related subjects include:
Megaputer on the text mining market
Sergei Ananyan is president of Megaputer, which is not one of the easier companies to get information about. They’re an essentially Russian firm based in Bloomington, Indiana. Their website is, to put it kindly, not up to date. And I wound up speaking with Sergei while he was at his rural vacation house, located somewhere between the Black and Aral Seas.
However, Sergei followed up by email with his views of the marketplace, and I think they’re interesting enough to share below. I really like his focus on analytic business processes, something that generally doesn’t get enough consideration.
Categories: Megaputer, Text mining | 1 Comment |
Introduction to ClearForest
I had a fascinating talk with Jay Henderson of ClearForest Friday. While I have more research to do before I know what I really think, there already is plenty to post about.
ClearForest is one of the two companies whose name comes up for fact extraction applications, probably even a little ahead of Attensity. Their flagship account is the GM deal they did with IBM, kicking off the whole warranty report mining boom. Procter & Gamble is no slouch of a customer either. They’re involved enough in anti-terrorism that, when I asked Jay if he knew who Cogito was, he said “Of course.” And apparently one of their techie founders is the guy who coined the term “text mining” in the first place.
Categories: ClearForest/Reuters, Text mining | 1 Comment |
Text mining for compliance and legal discovery
One theme that keeps recurring in my talks with text mining and other text analytics/text technology companies is compliance. Ditto legal discovery, which is closely related. Most of the focus seems to be on three kinds of data:
- Vehicle defect evidence. The TREAD Act is of course the big driver here (no pun intended).
- Drug side effect evidence. The FDA is pushing that one.
- Email/correspondence archives. Text search/filtering/clustering/mining whatever is now a standard part of legal discovery.
Categories: Enterprise search, Search engines, Text mining | 2 Comments |
Autonomy on text mining
I asked Mike Lynch (Autonomy CEO) about text mining. He responded with an example:
A very well-known company “mines” its incoming emails for signs of trouble, not via any linguistics-driven approach, but just by clustering them. If a cluster changes size anomalously over time, it bears close investigation.
Categories: About this blog, Autonomy, Search engines, Text mining | 1 Comment |
Towards an enterprise text architecture
My column this month for Computerworld is on enterprise text technology architecture. A sequel is promised for next month.
This month’s column focuses mainly on reciting application needs. Did I leave any important ones out?
Next time I’ll focus more on how to meet those needs. I need to write it in in 2 1/2 weeks or so. I plan to talk with a lot of industry players between now and then.
Categories: About this blog, Ontologies, Search engines, Text mining | 4 Comments |
Scoping the text mining market
Another Text Analytics/Mining Summit, another occasion to discuss text mining market numbers. Except — it’s really hard to get any specifics. Before writing this post, I decided to web search on text mining market to see if anybody had posted anything about its size or growth. The first and pretty much only relevant hit I could find was my own blog post of a year ago, reproduced below. Oh dear.
Categories: About this blog, Text Analytics Summit, Text mining | 2 Comments |
Relationship analytics — turbocharge for text mining?
While at the Text Analystics Summit, I came increasingly to suspect that two technologies – both of which I’ve put considerable research into recently — are very synergistic with each other:
- Text mining, one of the principal subjects of this blog
- Relationship analytics, which is a new phrase meaning “data management and analysis tools optimized for handling complex relationships.” Here a complex relationship is one that, if represented in a relationship graph, would have path length a lot more than 1 or 2.
Categories: About this blog, Text Analytics Summit, Text mining | 1 Comment |
The French love their language
One noteworthy aspect of the Text Analytics Summit is the French presence. France is generally inept in the software industry, but the text mining business is a clear exception. Temis is a French company. SPSS’s text mining operation (which was Lexiquest), is part French, part English, and run by a Frenchman. Teragram was founded by French guys. For variety, clustering company Semio was founded by a French semiotics professor, and nStein’s managers are a bunch of Quebecois.
Categories: About this blog, Text Analytics Summit, Text mining | 4 Comments |
Attensity, extractive exhaustion, and the FRN
Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean. Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in a blog post I made at the time, and on Attensity’s web site. This time, David’s Text Analytics Summit speech was basically a pitch for Attensity’s latest product release – and it was a pitch well worth hearing.
Read more
Categories: Attensity, BI integration, Comprehensive or exhaustive extraction, Text Analytics Summit, Text mining | 10 Comments |
Procter & Gamble on text mining projects
Terry McFadden of Procter & Gamble made a number of interesting points in his Text Analytics Summit talk, in the area of how to build and “amass” (his word) lexicons. Above all, I’m thrilled that he recognized the necessity of amassing lexicography that can be reused from one app to the next. Beyond that, specific comments and tips included: Read more