Text mining
Analysis of text mining companies, technology, and trends. Related subjects include:
Clarabridge takes on Attensity
Text mining newbie Clarabridge gave me the all-too-customary “Please let us brief you, but then don’t write about it for a while” routine. Now that it’s OK to post, what I’m up for offering is a few salient points in bullet form.
- The closest analogy to what Clarabridge does is Attensity’s new(ish) strategy – extract “facts” from documents and dump them into a relational database management system. In particular, Clarabridge and Attensity alike make the case “Our categorization is more flexible because it’s applied only after the extraction happens.”
- Clarabridge’s sweet spot is extracting user opinions from short documents. E.g., the customer uses cases they talk about are customer feedback forms, public blog postings, etc. about A. hotels and B. consumer software products.
- Clarabridge has a strong business intelligence mentality, describing the product as “ETL for unstructured data.” But then, it’s spun out of a BI consultancy that itself was founded by Microstrategy veterans.
- Clarabridge uses a different database schema than Attensity. Attensity’s fact-relationship network (FRN) is basically just two thin, long tables. Clarabridge, however, uses a Microstrategy-like star schema, in which different kinds of things that you can tokenize correspond to different dimensions.
Frankly, if somebody wants an alternative to the Attensity/Teradata/Business Objects partnership they could do worse than talk with Clarabridge.
Categories: Attensity, BI integration, Clarabridge, Comprehensive or exhaustive extraction, Text mining | Leave a Comment |
Text Analytics Summit marketing panel: Membership firmed up
We’ve now solidified the membership of the Text Analytics Summit marketing panel. It is:
- Curt Monash, President, Monash Information Services
- Dave Kellogg, CEO, Mark Logic Corporation
- Michelle De Haaff, VP Marketing, Attensity Corporation
- Michel Lemay, VP Marketing, nstein Technologies
- Mary Crissey, SAS Analytics Marketing Manager, SAS Institute
Michelle, Michel, and Mary are all obvious choices, responsible for marketing at leading text mining vendors. In addition, Mary has excelled on the same panel in the past, Michel sent me e-mail with some brilliant thoughts on the panel subject, and Attensity has one of the most interesting strategies in the text analytics market.
As for Dave — he’s simply one of the most astute marketing theorists working in software today. And he runs a very interesting text technology company. And he used to be most senior marketing guy in all of business intelligence, when he was SVP at Business Objects. In his copious free time, he writes a really cool blog.
Categories: Attensity, Mark Logic, nStein, SAS, Text Analytics Summit, Text mining | 3 Comments |
What’s going on at ClearForest?
I tried to invite Jay Henderson so speak on the Text Analytics Summit marketing panel, but got no answer to my e-mail. The company phone directory didn’t work so well for him either. I sent e-mail to a general PR company e-mail address, and that didn’t get returned. And Ravi tells me he has had similar difficulties reaching them. Read more
Categories: ClearForest/Reuters, Text mining | 5 Comments |
Three crucial issues in text analytics
As so often happens in life, I have gotten the job of fixing something that I was complaining about. Specifically, I’ve been asked to run the Marketing panel at the Text Analytics Summit in Newton, MA, June 12-13. In connection with this, organizer Ravi Virpal has asked me to come up with three major points or themes I feel we should address. Read more
Categories: Text Analytics Summit, Text mining | 4 Comments |
What’s interesting about the FAST venture in BI
FAST is annoying me a bit these days. It’s nothing serious, but travel schedule screw-up’s, an annoying embargo, and a screw-up in the annoying embargo have all hit at once. So I’ll keep this telegraphic and move on to other subjects.
- They’re doing fast queries without using a lot of RAM.
- They’re doing the usual text search thing of indexing across multiple “databases,” only now it’s applied to, well, databases. (Not that there’s much new about that particular aspect. Actually, there seems to be a bit of kludge in that they export the databases to some kind of simple text files.)
- They’re doing some level of concept identification ala the text mining guys. (They don’t call it “entity extraction” because the results aren’t dumped into a database anywhere, but instead are just used on the fly.) Of course, the text mining/search convergence goes both ways.
- They bought a BI/dashboard tool and are using it both to analyze query logs and also to do normal BI/dashboard kinds of things.
- They have big references for this stuff, at least the single-web-site query aspect. Well, actually, the customer names are confidential. Oh well.
And as another example of how this wasn’t the smoothest PR month for FAST, Steve Arnold somehow got the false idea that they were getting out of true text search altogether.
Categories: BI integration, Enterprise search, FAST, Search engines, Text mining | 3 Comments |
Text Analytics Summit — a promising idea gone bad
I hope to be proved wrong, but I think the Text Analytics Summits going forward will be a waste of time and money. The model the first two years was the same, with key features being:
- Lots of vendor-organized application stories.
- Half a day of badly-organized industry overview.
Most of the attendees are vendors, and the ones I talk with almost uniformly agree they didn’t learn much, and they didn’t meet many prospects either.
If you’re a user or prospect, you can do as well inviting vendors to make sales pitches to you, and eventually calling references. If you’re a vendor, it’s a great place to socialize with your buddies from other firms, but that’s about it. If you’re press or analyst — well, it’s an easy drive for me in Boston, and I’m really focused on this subject area, so I’ve been attending. But almost no other press or analysts went, and I honestly can’t blame them for staying away.
Could this change? Yes. But in my discussions with the organizers, I’ve gotten zero reason to think that it will.
Categories: Text Analytics Summit, Text mining | 4 Comments |
Text analytics is finally being used for investment analysis
Jay Henderson of ClearForest tells me that hedge funds are one of their more interesting growth areas. It’s about time.
I think a lot of the reason for investment firms not making more use of text analytics has been structural — Factiva, the (relatively speaking) mammoth joint venture of Reuters and Dow Jones, is forbidden by its parent companies from meeting investment firms’ needs. And that’s kind of a pity, as it’s probably the best-positioned firm to do so. It’s good to hear that the little guys are finally filling the gap.
Telling Attensity and ClearForest apart
So far as I can tell, Attensity’s strategy when the company was originally founded was rather like ClearForest’s strategy today – and vice-versa. That said, here’s where they seem to stand at this time:
- Attensity wants to make text analytics very easy to integrate into business intelligence and data mining – at the moment, they’re not too focused on the differences between those two disciplines – and is trying to deliver the best possible fact extraction consistent with that charter.
- ClearForest wants to provide really great information extraction — to the limits of what can be done without excessive knowledge engineering – and is trying to integrate as well as possible with other technologies, the better to serve the customers who need what they offer.
Categories: Application areas, Attensity, ClearForest/Reuters, Custom publishing, Mark Logic, TEMIS, Text mining | Leave a Comment |
Does web text mining need to be cloaked?
One semi-flagship use for text mining is to track sentiment across news articles, websites, etc. Should this be done openly, or is there a danger of being spoofed? (I doubt it; probably no more than a few of the sites would ever be motivated to do so.) But what if you’re making many hits against the same site, to the point that your traffic is unwelcome? Or maybe the site is a direct competitor. In such cases, hiding your tracks may be more relevant.
If any of this is an issue for you, you should take a look at Anonymizer’s growing enterprise offering. Apparently, there are commercial enterprises using thousands of seats each of Anonymizer’s cloaking service.
Categories: Text mining | Leave a Comment |
Text mining and search, joined at the hip
Most people in the text analytics market realize that text mining and search are somewhat related. But I don’t think they often stop to contemplate just how close the relationship is, could be, or someday probably will become. Here’s part of what I mean:
- Text mining powers search. The biggest text mining outfits in the world, possibly excepting the US intelligence community, are surely Google, Yahoo, and perhaps Microsoft.
- Search powers text mining. Restricting the corpus of documents to mine, even via a keyword search, makes tons of sense. That’s one of the good ideas in Attensity 4.
- Text mining and search are powered by the same underlying technologies. For starters, there’s all the tokenization, extraction, etc. that vendors in both areas license from Inxight and its competitors. Beyond that, I think there’s a future play in integrated taxonomy management that will rearrange the text analytics market landscape.