June 25, 2006

Relationship analytics — turbocharge for text mining?

While at the Text Analystics Summit, I came increasingly to suspect that two technologies – both of which I’ve put considerable research into recently — are very synergistic with each other:

If you attended any of the medical-research sessions at the Text Analytics Summit, you probably saw that the relationships being detected commonly have pathlengths of 4 or 6 or even more — Protein A causes the expression of Gene B which causes the production of Protein C which does something to Pathway D which blocks the formation of Molecule E which is crucial to Event F, or something like that. The detailed biology is beyond me, and in any case varies from application to application and discovery to discovery. But the outline of the problem is clear – extract facts from 15 million documents, and see how they fit together into non-trivial causal chains and relationships.

And it’s not just traditional medical research. Attensity’s adoption of the “fact/relationship network” terminology highlights that text analytics, in general, is often about – well, it’s often about determining facts and relationships. If you’re looking for a law-breaker — terrorist, thief, fraudster, whatever – you may need to track some non-trivial relationships. Epidemiology differs from other medical research applications in that it relies on smaller, mixed corpuses (e.g., lots of news reports) – but the relationships it traces are still complex. Text mining for failure analysis to date has been focused more on simple clustering of warranty claims than on sophisticated relationship tracking – but that too may change as the application segment matures.

Managing that kind of information and analysis in a generic relational DBMS is hard, at least from a performance standpoint. My client Cogito, however, has a cool product focused exactly on such problems, and I’m off to Oracle on Monday to learn about their offering in the space. Whether you’re a text mining supplier, or a user of such applications, you should probably check out the help these data managers can provide.

Comments

One Response to “Relationship analytics — turbocharge for text mining?”

  1. Text Technologies»Blog Archive » Introduction to ClearForest on July 23rd, 2006 7:22 am

    […] ClearForest is one of the two companies whose name comes up for fact extraction applications, probably even a little ahead of Attensity. Their flagship account is the GM deal they did with IBM, kicking off the whole warranty report mining boom. Procter & Gamble is no slouch of a customer either. They’re involved enough in anti-terrorism that, when I asked Jay if he knew who Cogito was, he said “Of course.” And apparently one of their techie founders is the guy who coined the term “text mining” in the first place. […]

Leave a Reply




Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.