Text mining for compliance and legal discovery
One theme that keeps recurring in my talks with text mining and other text analytics/text technology companies is compliance. Ditto legal discovery, which is closely related. Most of the focus seems to be on three kinds of data:
- Vehicle defect evidence. The TREAD Act is of course the big driver here (no pun intended).
- Drug side effect evidence. The FDA is pushing that one.
- Email/correspondence archives. Text search/filtering/clustering/mining whatever is now a standard part of legal discovery.
There’s also activity in censoring real time email, IMs, etc., but that seems to often be done either by speciality products (e.g., Assentor), or as part of a general email/spam/whatever control product. And I have a lot of question as to how well at least the latter works, especially in enterprises that don’t totally shut down workers’ access to private webmail accounts.
These are active, interesting markets, and I intend to write more about them soon. But for now, are there any big compliance/legal drivers of text technologies that I’m simply overlooking?
Comments
2 Responses to “Text mining for compliance and legal discovery”
Leave a Reply
[…] ClearForest is one of the two companies whose name comes up for fact extraction applications, probably even a little ahead of Attensity. Their flagship account is the GM deal they did with IBM, kicking off the whole warranty report mining boom. Procter & Gamble is no slouch of a customer either. They’re involved enough in anti-terrorism that, when I asked Jay if he knew who Cogito was, he said “Of course.” And apparently one of their techie founders is the guy who coined the term “text mining” in the first place. […]
[…] 3. In other cases, one is looking for trouble even before one has found some. Compliance often falls into this category, as does web-crawling reputation management. One process, favored by Autonomy, is simply to monitor document flow for all important themes, and hope that the trouble signs jump out at you. Alternatively, one can monitor documents for known bad event flags – vehicle malfunctions, drug side effects, angry customers, whatever. If there are only a few documents with such flags, one can read them directly If there are too many for humans to just read and digest in a timely manner – well, then you’ve transitioned into Case 1 or Case 2! • • • […]