Text mining
Analysis of text mining companies, technology, and trends. Related subjects include:
SAP is acquiring Inxight
More precisely, SAP is acquiring Business Objects, and of course Business Objects already acquired Inxight.
This could be interesting …
Categories: BI integration, Business Objects and Inxight, SAP, Text mining | Leave a Comment |
The Clarabridge approach to text mining
And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)
- Like Attensity, Clarabridge practices exhaustive extraction.* That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.
- Unlike Attensity, which uses a simple normalized relational schema, Clarabridge dumps the extracted data into a star schema. (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.) Read more
Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, Ontologies, Text mining | 2 Comments |
Text mining applications as per Attensity and Clarabridge
Besides asking them technical questions, I surveyed Attensity and Clarabridge last week about text mining application trends, getting generously detailed answers from Michelle De Haaff of Attensity and Justin Langseth of Clarabridge. Perhaps the most important point to emerge was that it’s not just about particular apps. Enterprises are doing text mining POCs (Proofs of Concept) around specific apps, commonly in the CRM area, but immediately structuring the buying process in anticipation of a rollout across multiple departments in the enterprise.
Other highlights of what they said included: Read more
Categories: Application areas, Attensity, Clarabridge, ClearForest/Reuters, Competitive intelligence, Factiva/Dow Jones, Investment research and trading, Text mining, Voice of the Customer | 3 Comments |
Nice new phrase — Voice of the Market
Michelle DeHaaff, Attensity’s VP of Marketing, just introduced me to a nice phrase — Voice of the Market, obviously related to Voice of the Customer. As Michelle put it:
We’ve also expanded into what we call Voice of the Market data – providing a combination of analysis on external and internal data
– this is how we’ve heard our customers put it:
*Customer feedback comes in many forms……when customers don’t know you are listening (blogs, public web forums) it is important to hear what they say.
*When customers purposely tell you something (via emails, in surveys, captured in customer service notes) it is not only important, but expected….
The first of those would be Voice of the Market, while the second would be Voice of the Customer.
Categories: Application areas, Attensity, Competitive intelligence, Text mining, Voice of the Customer | 2 Comments |
When to use exhaustive extraction
I’ve been emailing and/or talking with both Clarabridge and Attensity this week. Since they’re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In Clarabridge’s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off. However, their current high end is several million documents* per year. They suspect that in some current projects with much higher volumes the default may finally be turned off. Read more
David Bean of Attensity explains sentiment and other qualifiers
David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity’s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining I’ve seen anywhere. Naturally, this is rolled into an Attensity-specific worldview and sales pitch — but so what? Read more
Categories: Attensity, Comprehensive or exhaustive extraction, Sentiment analysis, Text mining, Voice of the Customer | 1 Comment |
Predictive analytics vendors’ text mining sophistication
Steve Gallant of KXEN contacted me over the summer to show me KXEN’s new text mining capability. It was pretty basic bag-of-words stuff, which is still a lot better than nothing, and actually fits pretty well with KXEN’s general simplicity-centric strategy.
This inspired me to check whether there had been any big changes in text mining capabilities at SAS or SPSS. It turned out there hadn’t. SAS is also still on the bag-of-words level. SPSS, however, does do sentiment analysis (pretty obvious, considering their focus on surveys and the like) and negation.
Thanks go out to Mary Crissey and Olivier Jouve for getting back to me when I asked, along with apologies for taking a while to post what they told me.
Categories: SAS, Sentiment analysis, SPSS, Text mining | Leave a Comment |
More on text processing in CEP
StreamBase isn’t the only complex event/stream processing (CEP) vendor doing text processing. Progress Apama is as well. Stemming, fuzzy matching, and so on seem to happen all the time. But there’s also at least one case where they flat-out do sentiment analysis. Edit: I presume this is in the investment market, as that’s where most of Progress Apama’s business is. Read more
Categories: Investment research and trading, Progress and EasyAsk, Sentiment analysis, Text mining | Leave a Comment |
Event stream processors active in text filtering
OK. I secured permission to actually quote the details on something I’d previously dropped a small hint about — stream processing for text messages. Traditionally, that’s been the province of enterprise search companies. A decade ago, Verity had a kernel group of 6-7 engineers under Phil Nelson. They managed to produce not only a decent search engine, but a search engine “turned on its side” as well. I.e., instead of running one query against a corpus, they could run many queries each against documents as they arrived, one document at a time. Subsequently, the same idea has been implemented by most enterprise search providers, at least those that are serious about the intelligence market.
Well, the event-processing guys are active in that market too. At least StreamBase is. Read more
Categories: Autonomy, Business Objects and Inxight, Enterprise search, Search engines, Text mining | 2 Comments |
Text analytics marketplace trends
It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.
*Factiva is the most significant exception. Hint, hint.
If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.
*I.e., part of the “T” in “ETL” (Extract/Transform/Load).
Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet. Read more