The Clarabridge approach to text mining
And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)
- Like Attensity, Clarabridge practices exhaustive extraction.* That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.
- Unlike Attensity, which uses a simple normalized relational schema, Clarabridge dumps the extracted data into a star schema. (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.) Read more
Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, Ontologies, Text mining | 2 Comments |
Text mining applications as per Attensity and Clarabridge
Besides asking them technical questions, I surveyed Attensity and Clarabridge last week about text mining application trends, getting generously detailed answers from Michelle De Haaff of Attensity and Justin Langseth of Clarabridge. Perhaps the most important point to emerge was that it’s not just about particular apps. Enterprises are doing text mining POCs (Proofs of Concept) around specific apps, commonly in the CRM area, but immediately structuring the buying process in anticipation of a rollout across multiple departments in the enterprise.
Other highlights of what they said included: Read more
Categories: Application areas, Attensity, Clarabridge, ClearForest/Reuters, Competitive intelligence, Factiva/Dow Jones, Investment research and trading, Text mining, Voice of the Customer | 3 Comments |
Nice new phrase — Voice of the Market
Michelle DeHaaff, Attensity’s VP of Marketing, just introduced me to a nice phrase — Voice of the Market, obviously related to Voice of the Customer. As Michelle put it:
We’ve also expanded into what we call Voice of the Market data – providing a combination of analysis on external and internal data
– this is how we’ve heard our customers put it:
*Customer feedback comes in many forms……when customers don’t know you are listening (blogs, public web forums) it is important to hear what they say.
*When customers purposely tell you something (via emails, in surveys, captured in customer service notes) it is not only important, but expected….
The first of those would be Voice of the Market, while the second would be Voice of the Customer.
Categories: Application areas, Attensity, Competitive intelligence, Text mining, Voice of the Customer | 2 Comments |
When to use exhaustive extraction
I’ve been emailing and/or talking with both Clarabridge and Attensity this week. Since they’re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In Clarabridge’s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off. However, their current high end is several million documents* per year. They suspect that in some current projects with much higher volumes the default may finally be turned off. Read more
David Bean of Attensity explains sentiment and other qualifiers
David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity’s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining I’ve seen anywhere. Naturally, this is rolled into an Attensity-specific worldview and sales pitch — but so what? Read more
Categories: Attensity, Comprehensive or exhaustive extraction, Sentiment analysis, Text mining, Voice of the Customer | 1 Comment |
A tip for submitting to DMOZ — make your site description clear
I just picked out a few of the many unreviewed sites in my DMOZ categories to evaluate, and listed most of those I reviewed.
How did I choose them to get screened? Mainly, I picked out ones with focused descriptions, titles, and so on, that just seemed likely to be listable based on that info (which is the essence of what I see on the page where all the various submitted sites are linked). I correctly guessed that I’d be able to quickly understand what I was seeing and judge whether to list the site or not, quickly write the official site description, and so on. Read more
Categories: Categorization and filtering, Directories, ODP and DMOZ, Search engine optimization (SEO) | 2 Comments |
Predictive analytics vendors’ text mining sophistication
Steve Gallant of KXEN contacted me over the summer to show me KXEN’s new text mining capability. It was pretty basic bag-of-words stuff, which is still a lot better than nothing, and actually fits pretty well with KXEN’s general simplicity-centric strategy.
This inspired me to check whether there had been any big changes in text mining capabilities at SAS or SPSS. It turned out there hadn’t. SAS is also still on the bag-of-words level. SPSS, however, does do sentiment analysis (pretty obvious, considering their focus on surveys and the like) and negation.
Thanks go out to Mary Crissey and Olivier Jouve for getting back to me when I asked, along with apologies for taking a while to post what they told me.
Categories: SAS, Sentiment analysis, SPSS, Text mining | Leave a Comment |
A challenge to DMOZ bashers
Give or take a corrected typo, here’s a challenge to DMOZ bashers I just wrote in the flame war thread.
If you want to do something that is:
A. Correct
B. Credible
C. Potentially usefuljust go find a specific category with terrible listings, and publicize the fact with overwhelmingly clear proof of your assessment.
If that’s not EASY for you to do … then maybe DMOZ isn’t so bad after all, eh?
In particular, I’d encourage you to post a version of the category that is clearly better than what is currently there.
Categories: Categorization and filtering, Directories, ODP and DMOZ, Social software and online media | 1 Comment |
DMOZ — yet another flame war
My latest thoughts about DMOZ and the ODP may be found in this blog comment thread.
The gist is:
- DMOZ has many problems, such as categories that are at least five years out of date.
- Newly, corruptly listed sites are NOT high on the list of problems.
- In fact, the attention paid to avoiding such corruption is a terrible drain on ODP resources.
- There are a lot of liars and/or idiots bashing DMOZ in the website owner community.
- robjones is a sarcastic jerk, but he’s our sarcastic jerk.
Or something like that. As I said, it’s a flame war …
Anyhow, I’m flying off on a two-week snorkeling trip Saturday, and should be much mellower soon.
Categories: Categorization and filtering, Directories, ODP and DMOZ, Social software and online media | 8 Comments |
The case for Inxight Awareness Server
I’ve been pretty skeptical about Inxight’s Awareness Server. My theory is that ordinary enterprise search engines can index remotely anyway, and they offer much better search functionality. Inxight’s Ian Hersey was kind enough to write in and offer two counter-arguments.
First, Ian points out that there are circumstances when, due to security and permissions, you can’t really index everything via one search engine. Specifically, he offers the government as an example. OK, I can see that in the government, with its classified and/or regulated silos. However, I have trouble thinking of many more examples. While there certainly are plenty of instances where a variety of organizations share information on a somewhat arms-length basis, it’s tough to think of such cases where federated text search would come into play.
Second, Ian in essence disputes my claim of inferior functionality. While implicitly conceding — as well he should! — that Inxight’s Awareness Server doesn’t do some things full-featured search engines do, he points out analytic features that may not be found in conventional search engine offering. The big one he calls out is faceted search — which of course was the core of Intelliseek, the acquisition Awareness Server came from. Hmm. Faceted search has a checkered history, with Excite and Northern Light being perhaps the most visible among many failures. On the other hand, it’s a great idea that keeps being tried, and some versions — notably Endeca’s — have turned out well.
I guess I’ll have to reserve judgment on that part until I look at Inxight’s product and see what they do and don’t actually have.