Text mining

Analysis of text mining companies, technology, and trends. Related subjects include:

October 8, 2007

SAP is acquiring Inxight

More precisely, SAP is acquiring Business Objects, and of course Business Objects already acquired Inxight.

This could be interesting …

Categories: BI integration, Business Objects and Inxight, SAP, Text mining

The Clarabridge approach to text mining

And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)

Like Attensity, Clarabridge practices exhaustive extraction.* That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.
Unlike Attensity, which uses a simple normalized relational schema, Clarabridge dumps the extracted data into a star schema. (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.) Read more

Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, Ontologies, Text mining

2 Comments

October 5, 2007

Text mining applications as per Attensity and Clarabridge

Besides asking them technical questions, I surveyed Attensity and Clarabridge last week about text mining application trends, getting generously detailed answers from Michelle De Haaff of Attensity and Justin Langseth of Clarabridge. Perhaps the most important point to emerge was that it’s not just about particular apps. Enterprises are doing text mining POCs (Proofs of Concept) around specific apps, commonly in the CRM area, but immediately structuring the buying process in anticipation of a rollout across multiple departments in the enterprise.

Other highlights of what they said included: Read more

Categories: Application areas, Attensity, Clarabridge, ClearForest/Reuters, Competitive intelligence, Factiva/Dow Jones, Investment research and trading, Text mining, Voice of the Customer

3 Comments

October 5, 2007

Nice new phrase — Voice of the Market

Michelle DeHaaff, Attensity’s VP of Marketing, just introduced me to a nice phrase — Voice of the Market, obviously related to Voice of the Customer. As Michelle put it:

We’ve also expanded into what we call Voice of the Market data – providing a combination of analysis on external and internal data

– this is how we’ve heard our customers put it:

*Customer feedback comes in many forms……when customers don’t know you are listening (blogs, public web forums) it is important to hear what they say.

*When customers purposely tell you something (via emails, in surveys, captured in customer service notes) it is not only important, but expected….

The first of those would be Voice of the Market, while the second would be Voice of the Customer.

Categories: Application areas, Attensity, Competitive intelligence, Text mining, Voice of the Customer

2 Comments

October 5, 2007

When to use exhaustive extraction

I’ve been emailing and/or talking with both Clarabridge and Attensity this week. Since they’re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In Clarabridge’s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off. However, their current high end is several million documents* per year. They suspect that in some current projects with much higher volumes the default may finally be turned off. Read more

Categories: Attensity, Clarabridge, Comprehensive or exhaustive extraction, Text mining

1 Comment

October 5, 2007

David Bean of Attensity explains sentiment and other qualifiers

David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity’s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining I’ve seen anywhere. Naturally, this is rolled into an Attensity-specific worldview and sales pitch — but so what? Read more

Categories: Attensity, Comprehensive or exhaustive extraction, Sentiment analysis, Text mining, Voice of the Customer

1 Comment

September 18, 2007

Predictive analytics vendors’ text mining sophistication

Steve Gallant of KXEN contacted me over the summer to show me KXEN’s new text mining capability. It was pretty basic bag-of-words stuff, which is still a lot better than nothing, and actually fits pretty well with KXEN’s general simplicity-centric strategy.

This inspired me to check whether there had been any big changes in text mining capabilities at SAS or SPSS. It turned out there hadn’t. SAS is also still on the bag-of-words level. SPSS, however, does do sentiment analysis (pretty obvious, considering their focus on surveys and the like) and negation.

Thanks go out to Mary Crissey and Olivier Jouve for getting back to me when I asked, along with apologies for taking a while to post what they told me.

Categories: SAS, Sentiment analysis, SPSS, Text mining

More on text processing in CEP

StreamBase isn’t the only complex event/stream processing (CEP) vendor doing text processing. Progress Apama is as well. Stemming, fuzzy matching, and so on seem to happen all the time. But there’s also at least one case where they flat-out do sentiment analysis. Edit: I presume this is in the investment market, as that’s where most of Progress Apama’s business is. Read more

Categories: Investment research and trading, Progress and EasyAsk, Sentiment analysis, Text mining

Event stream processors active in text filtering

OK. I secured permission to actually quote the details on something I’d previously dropped a small hint about — stream processing for text messages. Traditionally, that’s been the province of enterprise search companies. A decade ago, Verity had a kernel group of 6-7 engineers under Phil Nelson. They managed to produce not only a decent search engine, but a search engine “turned on its side” as well. I.e., instead of running one query against a corpus, they could run many queries each against documents as they arrived, one document at a time. Subsequently, the same idea has been implemented by most enterprise search providers, at least those that are serious about the intelligence market.

Well, the event-processing guys are active in that market too. At least StreamBase is. Read more

Categories: Autonomy, Business Objects and Inxight, Enterprise search, Search engines, Text mining

2 Comments

July 22, 2007

Text analytics marketplace trends

It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.

*Factiva is the most significant exception. Hint, hint.

If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.

*I.e., part of the “T” in “ETL” (Extract/Transform/Load).

Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet. Read more

Categories: Application areas, ClearForest/Reuters, Custom publishing, Factiva/Dow Jones, Mark Logic, nStein, SAS, Search engines, Spam and antispam, Text Analytics Summit, Text mining, Voice of the Customer

2 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Text mining

SAP is acquiring Inxight

The Clarabridge approach to text mining

Text mining applications as per Attensity and Clarabridge

Nice new phrase — Voice of the Market

When to use exhaustive extraction

David Bean of Attensity explains sentiment and other qualifiers

Predictive analytics vendors’ text mining sophistication

More on text processing in CEP

Event stream processors active in text filtering

Text analytics marketplace trends

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin