Text mining

Analysis of text mining companies, technology, and trends. Related subjects include:

July 26, 2006

Megaputer on the text mining market

Sergei Ananyan is president of Megaputer, which is not one of the easier companies to get information about. They’re an essentially Russian firm based in Bloomington, Indiana. Their website is, to put it kindly, not up to date. And I wound up speaking with Sergei while he was at his rural vacation house, located somewhere between the Black and Aral Seas.

However, Sergei followed up by email with his views of the marketplace, and I think they’re interesting enough to share below. I really like his focus on analytic business processes, something that generally doesn’t get enough consideration.

Categories: Megaputer, Text mining

1 Comment

July 23, 2006

Introduction to ClearForest

I had a fascinating talk with Jay Henderson of ClearForest Friday. While I have more research to do before I know what I really think, there already is plenty to post about.

ClearForest is one of the two companies whose name comes up for fact extraction applications, probably even a little ahead of Attensity. Their flagship account is the GM deal they did with IBM, kicking off the whole warranty report mining boom. Procter & Gamble is no slouch of a customer either. They’re involved enough in anti-terrorism that, when I asked Jay if he knew who Cogito was, he said “Of course.” And apparently one of their techie founders is the guy who coined the term “text mining” in the first place.

Categories: ClearForest/Reuters, Text mining

1 Comment

July 23, 2006

Text mining for compliance and legal discovery

One theme that keeps recurring in my talks with text mining and other text analytics/text technology companies is compliance. Ditto legal discovery, which is closely related. Most of the focus seems to be on three kinds of data:

Vehicle defect evidence. The TREAD Act is of course the big driver here (no pun intended).
Drug side effect evidence. The FDA is pushing that one.
Email/correspondence archives. Text search/filtering/clustering/mining whatever is now a standard part of legal discovery.

Categories: Enterprise search, Search engines, Text mining

2 Comments

July 23, 2006

Autonomy on text mining

I asked Mike Lynch (Autonomy CEO) about text mining. He responded with an example:

A very well-known company “mines” its incoming emails for signs of trouble, not via any linguistics-driven approach, but just by clustering them. If a cluster changes size anomalously over time, it bears close investigation.

Categories: About this blog, Autonomy, Search engines, Text mining

1 Comment

July 11, 2006

Towards an enterprise text architecture

My column this month for Computerworld is on enterprise text technology architecture. A sequel is promised for next month.

This month’s column focuses mainly on reciting application needs. Did I leave any important ones out?

Next time I’ll focus more on how to meet those needs. I need to write it in in 2 1/2 weeks or so. I plan to talk with a lot of industry players between now and then.

Categories: About this blog, Ontologies, Search engines, Text mining

4 Comments

June 26, 2006

Scoping the text mining market

Another Text Analytics/Mining Summit, another occasion to discuss text mining market numbers. Except — it’s really hard to get any specifics. Before writing this post, I decided to web search on text mining market to see if anybody had posted anything about its size or growth. The first and pretty much only relevant hit I could find was my own blog post of a year ago, reproduced below. Oh dear.

Categories: About this blog, Text Analytics Summit, Text mining

2 Comments

June 25, 2006

Relationship analytics — turbocharge for text mining?

While at the Text Analystics Summit, I came increasingly to suspect that two technologies – both of which I’ve put considerable research into recently — are very synergistic with each other:

Text mining, one of the principal subjects of this blog
Relationship analytics, which is a new phrase meaning “data management and analysis tools optimized for handling complex relationships.” Here a complex relationship is one that, if represented in a relationship graph, would have path length a lot more than 1 or 2.

Categories: About this blog, Text Analytics Summit, Text mining

1 Comment

June 24, 2006

The French love their language

One noteworthy aspect of the Text Analytics Summit is the French presence. France is generally inept in the software industry, but the text mining business is a clear exception. Temis is a French company. SPSS’s text mining operation (which was Lexiquest), is part French, part English, and run by a Frenchman. Teragram was founded by French guys. For variety, clustering company Semio was founded by a French semiotics professor, and nStein’s managers are a bunch of Quebecois.

Categories: About this blog, Text Analytics Summit, Text mining

4 Comments

June 24, 2006

Attensity, extractive exhaustion, and the FRN

Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean. Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in a blog post I made at the time, and on Attensity’s web site. This time, David’s Text Analytics Summit speech was basically a pitch for Attensity’s latest product release – and it was a pitch well worth hearing.
Read more

Categories: Attensity, BI integration, Comprehensive or exhaustive extraction, Text Analytics Summit, Text mining

10 Comments

June 24, 2006

Procter & Gamble on text mining projects

Terry McFadden of Procter & Gamble made a number of interesting points in his Text Analytics Summit talk, in the area of how to build and “amass” (his word) lexicons. Above all, I’m thrilled that he recognized the necessity of amassing lexicography that can be reused from one app to the next. Beyond that, specific comments and tips included: Read more

Categories: About this blog, ClearForest/Reuters, Companies and products, Ontologies, Text Analytics Summit, Text mining

2 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Text mining

Megaputer on the text mining market

Introduction to ClearForest

Text mining for compliance and legal discovery

Autonomy on text mining

Towards an enterprise text architecture

Scoping the text mining market

Relationship analytics — turbocharge for text mining?

The French love their language

Attensity, extractive exhaustion, and the FRN

Procter & Gamble on text mining projects

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin