Relationship analytics — turbocharge for text mining?
While at the Text Analystics Summit, I came increasingly to suspect that two technologies – both of which I’ve put considerable research into recently — are very synergistic with each other:
- Text mining, one of the principal subjects of this blog
- Relationship analytics, which is a new phrase meaning “data management and analysis tools optimized for handling complex relationships.” Here a complex relationship is one that, if represented in a relationship graph, would have path length a lot more than 1 or 2.
If you attended any of the medical-research sessions at the Text Analytics Summit, you probably saw that the relationships being detected commonly have pathlengths of 4 or 6 or even more — Protein A causes the expression of Gene B which causes the production of Protein C which does something to Pathway D which blocks the formation of Molecule E which is crucial to Event F, or something like that. The detailed biology is beyond me, and in any case varies from application to application and discovery to discovery. But the outline of the problem is clear – extract facts from 15 million documents, and see how they fit together into non-trivial causal chains and relationships.
And it’s not just traditional medical research. Attensity’s adoption of the “fact/relationship network” terminology highlights that text analytics, in general, is often about – well, it’s often about determining facts and relationships. If you’re looking for a law-breaker — terrorist, thief, fraudster, whatever – you may need to track some non-trivial relationships. Epidemiology differs from other medical research applications in that it relies on smaller, mixed corpuses (e.g., lots of news reports) – but the relationships it traces are still complex. Text mining for failure analysis to date has been focused more on simple clustering of warranty claims than on sophisticated relationship tracking – but that too may change as the application segment matures.
Managing that kind of information and analysis in a generic relational DBMS is hard, at least from a performance standpoint. My client Cogito, however, has a cool product focused exactly on such problems, and I’m off to Oracle on Monday to learn about their offering in the space. Whether you’re a text mining supplier, or a user of such applications, you should probably check out the help these data managers can provide.
Comments
One Response to “Relationship analytics — turbocharge for text mining?”
Leave a Reply
[…] ClearForest is one of the two companies whose name comes up for fact extraction applications, probably even a little ahead of Attensity. Their flagship account is the GM deal they did with IBM, kicking off the whole warranty report mining boom. Procter & Gamble is no slouch of a customer either. They’re involved enough in anti-terrorism that, when I asked Jay if he knew who Cogito was, he said “Of course.” And apparently one of their techie founders is the guy who coined the term “text mining” in the first place. […]