Application areas
Posts focusing on the use of text analytics technologies in specific application domains. Related subjects include:
- Any subcategory
- (in DBMS2) Specific application areas for other analytic and database technologies
Social technology in the enterprise
The recent Dreamforce conference (i.e, salesforce.com’s extravaganza) focused attention on “the social enterprise” or, more generally, enterprises’ uses of social technology. salesforce is evidently serious about this push, with development/acquisition investment (e.g. Chatter, Radian 6), marketing focus (e.g. much of Dreamforce) and sales effort (Mark Benioff says he got thrown out of a CIO’s office because he wouldn’t stop talking about the “social” subject) all aligned.
Denis Pombriant obviously attended the same Marc Benioff session I did. Dion Hinchcliffe blogged the whole story in considerable detail.
It’s a cool story, and worthy of attention. But I’d like to step back and remind us that there are numerous different ways to use social technology in the enterprise, which probably shouldn’t be confused with each other. And then I’d like to discuss one area of social technology that’s relatively new to me: integration between social and operational applications.
The state of the art in text analytics applications
Text analytics application areas typically fall into one or more of three broad, often overlapping domains:
- Understanding the opinions of customers, prospects, or other groups. This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) — in which case — or public-domain documents such as blogs, forum posts, and tweets. The former is usually called Voice of the Customer (VotC), while the latter is Voice of the Market (VotM).
- Detecting and identifying problems. This can happen across many domains — VotC, VotM, diagnosing equipment malfunctions, identifying bad guys (from terrorists to fraudsters), or even getting early warnings of infectious disease outbreaks.
- Aiding text search, custom publishing, and other electronic document-shuffling use cases, often via document augmentation.
For several years, I’ve been distressed at the lack of progress in text analytics or, as it used to be called, text mining. Yes, the rise of sentiment analysis has been impressive, and higher volumes of text data are being processed than were before. But otherwise, there’s been a lot of the same old, same old. Most actual deployed applications of text analytics or text mining go something like this:
- A bunch of documents are analyzed to ascertain the ideas expressed in them.
- A count is made as to how many times each idea turns up.
- The application user notices any surprisingly large numbers, and as result of noticing pays attention to the corresponding ideas.
Often, it seems desirable to integrate text analytics with business intelligence and/or predictive analytics tools that operate on tabular data is. Even so, such integration is most commonly weak or nonexistent. Apart from the usual reasons for silos of automation, I blame this lack on a mismatch in precision, among other reasons. A 500% increase in mentions of a subject could be simple coincidence, or the result of a single identifiable press article. In comparison, a 5% increase in a conventional business metric might be much more important.
But in fairness, the text analytics innovation picture hasn’t been quite as bleak as what I’ve been painting so far. Read more
Categories: Attensity, BI integration, Investment research and trading, SPSS, Text mining, Voice of the Customer | 12 Comments |
More website weirdness
Here’s something longer-lasting and weirder than Vertica’s “We sell turkeys” theme: Mark Logic, whose product is used primarily to help enterprises make their content more acceptable, doesn’t have a search engine on its own website.* Read more
Categories: ClearForest/Reuters, Custom publishing, Mark Logic, Search engines | 7 Comments |
Are denial-of-insight attacks a threat to search logs and/or VOTC/VOTM apps?
TechTaxi points out that it’s at least theoretically possible to, by polluting the Web, pollute somebody’s web-wide information gathering. (Hat tip to Daniel Tunkelang.) They further assert this is a relatively near-term threat.
The theory can’t be denied. What’s more, bad actors have other motives to pollute the Web. For example, if they plant favorable automated comments about their own products or unfavorable about the competition’s, Voice of the Customer/Market applications will naturally be confused. And if automated reputation-checkers get more prominent, there will be a major incentive to game them, just as there has been for Google’s PageRank. So VOTC/VOTM market research tools could polluted as a side effect.
Similarly, if somebody wants to test your e-commerce site by throwing a ton of searches at it, your search logs will lose value.
But disinformation of competitors for the sake of disinformation? Or, as the article suggestions, vandalism/extortion? Off the top of my head, I’m not thinking of a serious near-term threat scenario.
Categories: Competitive intelligence, Search engines, Spam and antispam, Voice of the Customer | 2 Comments |
Attensity update
I had a brief chat with the Attensity guys at their Teradata Partners Conference booth – mainly CTO David Bean, although he did buck one question to sales chief Jeff Johnson. The business trends story remained the same as it was in June: The sweet spot for new sales remains Voice of the Customer/Voice of the Market, while on-premise/SaaS new-name accounts are split around 50-50 (by number, not revenue).
David’s thoughts as to why the SaaS share isn’t even higher – as it seems to be for Clarabridge* – centered on the point that some customers want to blend internal and external data, and may not want to ship the internal part out to a SaaS provider. Besides, if it’s tabular data, I suspect Attensity isn’t the right place to ship it anyway.
*Speaking of Clarabridge, CEO Sid Banerjee recently posted a thoughtful company update in this comment thread.
When I challenged him on ease of use, David said that Attensity is readying a Microstrategy-based offering, which is obviously meant to compete with Clarabridge and any of its perceived advantages head-on.
Categories: Application areas, Attensity, Clarabridge, Competitive intelligence, Software as a Service (SaaS), Text mining, Text mining SaaS, Voice of the Customer | 1 Comment |
Attivio update
I talked w/ Andrew McKay of Attivio for 2 ½ hours Thursday. I’ve also been working with some Attivio engineers on a blog search engine. I think it’s time to post about Attivio. 🙂 Read more
Categories: Application areas, Attivio, Enterprise search, Lucene, Structured search | 7 Comments |
Low-latency text mining in the investment market
I’m not at Gartner’s Event Processing conference, but there seem to be some interesting posts and articles coming out of it. Seth Grimes has one on Reuters’ integration of text mining and event processing, including sentiment analysis. Well worth reading. Lots more detail than I’ve ever posted on similar applications.
Categories: ClearForest/Reuters, Investment research and trading, Sentiment analysis, Text mining | 4 Comments |
One overview of e-discovery
I just found a year-old (almost) blog post from EMC executive Andrew Cohen that succinctly lays out his view (which he believes to mainly be a consensus stance) on e-discovery. Cohen is evidently both a lawyer and a honcho in document management system vendor EMC’s Compliance Division, which is probably relevant to interpreting his outlook, in the spirit of the old Kennedy School dictum that “Where you stand depends upon where you sit.”
Highlights included:
- Information management is central to e-discovery.
- In particular, auditability (my word) is central, if you want electronic documents to hold up as evidence in court.
- Search is good enough, but it’s not the biggest issue in e-discovery.
- E-mail archiving has reached the tipping point, and is increasingly a must-have, largely for its e-discovery benefits.
Categories: E-discovery, Enterprise search | Leave a Comment |
The layered messaging marketing model as applied to Attensity
My general layered messaging theory survived its first test against an IT vendor example – Netezza. Let’s try another, in this case a company that’s not a Monash Research client. Read more
Categories: Attensity, Competitive intelligence, Text mining, Voice of the Customer | 3 Comments |
How good does e-discovery search need to be?
Two years ago, CEO Mike Lynch of Autonomy tried to persuade me that Autonomy was and would remain dominant in the e-discovery search market because: Read more
Categories: Autonomy, E-discovery, Enterprise search, Search engines | 1 Comment |