More on Attensity
I had a long and far-ranging talk today with Attensity. Key takeaways included:
- They want to be positioned in the BI space, e.g. as “ETL for text/unstructured data.” They place a lot of value on their partnership with Business Objects and Teradata. And, as a key part of the exhaustive extraction/FRN story, they think that BI/data warehouse information roll-up tools are an excellent (if imperfect) substitute for hardcore semantic extraction.
- Attensity 4 is coming out soon, and it’s going to feature extremely multimodal text analysis.
Categories: Attensity, Text mining | 3 Comments |
Pioneers moving on
Ramana Rao is leaving Inxight, or has by now. Today I also discovered that Todd Wakefield is leaving Attensity. Such things happen in all industries, of course.
Categories: Attensity, Business Objects and Inxight, Text mining | Leave a Comment |
Megaputer on the text mining market
Sergei Ananyan is president of Megaputer, which is not one of the easier companies to get information about. They’re an essentially Russian firm based in Bloomington, Indiana. Their website is, to put it kindly, not up to date. And I wound up speaking with Sergei while he was at his rural vacation house, located somewhere between the Black and Aral Seas.
However, Sergei followed up by email with his views of the marketplace, and I think they’re interesting enough to share below. I really like his focus on analytic business processes, something that generally doesn’t get enough consideration.
Categories: Megaputer, Text mining | 1 Comment |
Introduction to ClearForest
I had a fascinating talk with Jay Henderson of ClearForest Friday. While I have more research to do before I know what I really think, there already is plenty to post about.
ClearForest is one of the two companies whose name comes up for fact extraction applications, probably even a little ahead of Attensity. Their flagship account is the GM deal they did with IBM, kicking off the whole warranty report mining boom. Procter & Gamble is no slouch of a customer either. They’re involved enough in anti-terrorism that, when I asked Jay if he knew who Cogito was, he said “Of course.” And apparently one of their techie founders is the guy who coined the term “text mining” in the first place.
Categories: ClearForest/Reuters, Text mining | 1 Comment |
Autonomy on text mining
I asked Mike Lynch (Autonomy CEO) about text mining. He responded with an example:
A very well-known company “mines” its incoming emails for signs of trouble, not via any linguistics-driven approach, but just by clustering them. If a cluster changes size anomalously over time, it bears close investigation.
Categories: About this blog, Autonomy, Search engines, Text mining | 1 Comment |
Update: Autonomy/Verity merger
I had a couple of very interesting calls with Autonomy last week. One message I got was that they do not want to be pigeonholed in search, which they think on the whole is a primitive way of dealing with “unstructured information.” Nonetheless, my first post based on those calls will indeed focus on text indexing and search. You see, I wrote quite skeptically about the Autonomy/Verity merger when it was announced, and I’d like to amend that with an updated opinion. Autonomy’s claims can be summarized in part by the following: Read more
Categories: About this blog, Autonomy, Enterprise search, Search engines | Leave a Comment |
Lead UIMA architect Dave Ferrucci speaks about adoption
Dave Ferrucci, lead architect for UIMA, shared some detailed views with me about UIMA adoption. WIth his permission, they are reproduced below. UIMA is still not getting a lot of attention from commercial text analytics vendors, but ultimately I think it will prevail, if just because nobody cares enough to start a war of dueling alternative standards.* So it’s something you should educate yourself about as it progresses.
*And IBM plans to convince me ASAP that even that assessment is too negative, which it well may be. Stay tuned.
So to sum up — 1. We seem to have fair amount of traction with the UIMA framework by communities that are very interested in plug-n-play with components from other providers. This includes the government, life sciences and research communities. 2. The UIMA standard, as opposed to the specific Java Framework implementation, developed under an SDO will broaden the opportunity and strengthen the case of adoption of UIMA as a standard for text and multi-modal analytics that allows interoperability across different frameworks and applications. It would of course be the case that the Java UIMA Framework would comply to the standard.
The complete email follows.
Read more
Categories: About this blog, IBM and UIMA, Open source text analytics | 2 Comments |
Google’s internal text-based project/knowledge management
Slashdot turned up an amazing article in Baseline on Google’s infrastructure. There’s lots of gee-whiz stuff in there about server farms, petabytes of disk packed into a standard shipping container so as to allow the setup of more server farms around the globe, and so on. But even more interesting to me was another point, about Google’s internal use of its own technology. In at least one case – a hybrid of project and knowledge management – Google really seems to be doing what other firms only dream about as futures. Here’s the relevant excerpt:
Categories: About this blog, Enterprise search, Google, Search engines, Specialized search | 2 Comments |
Attensity, extractive exhaustion, and the FRN
Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean. Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in a blog post I made at the time, and on Attensity’s web site. This time, David’s Text Analytics Summit speech was basically a pitch for Attensity’s latest product release – and it was a pitch well worth hearing.
Read more
Categories: Attensity, BI integration, Comprehensive or exhaustive extraction, Text Analytics Summit, Text mining | 10 Comments |
Procter & Gamble on text mining projects
Terry McFadden of Procter & Gamble made a number of interesting points in his Text Analytics Summit talk, in the area of how to build and “amass” (his word) lexicons. Above all, I’m thrilled that he recognized the necessity of amassing lexicography that can be reused from one app to the next. Beyond that, specific comments and tips included: Read more