June 26, 2006
Another Text Analytics/Mining Summit, another occasion to discuss text mining market numbers. Except — it’s really hard to get any specifics. Before writing this post, I decided to web search on text mining market to see if anybody had posted anything about its size or growth. The first and pretty much only relevant hit I could find was my own blog post of a year ago, reproduced below. Oh dear.
Susan Feldman of IDC probably has better numbers than I do, but she correctly points out how unreliable they are. Thus, all she’ll say is that the vastly bigger market of text/content-related stuff is growing at a very healthy 35% clip. However, that doesn’t say much about the small text mining segment. Also confounding the issue:
- Most companies in the business are small and private, with no obligation to report financial results.
- Stats vendors SPSS and SAS are showing great strength and growth in text mining, but they aren’t giving out dollar numbers, and if they did they surely wouldn’t be clear about what’s true text mining and what is general stats dragalong surrounding text mining sales. Plus I pressed pretty hard to get a few numbers from each of them anyway, and now am wondering what they’d really, really prefer I kept confidential.
- There’s a lot of size at Factiva, the Reuters/Dow Jones joint venture, but it’s unclear how much of that to call text mining.
One data point that arose at the Text Analytics Summit was that a typical leading text mining company gets only 25% or so of its revenue from professional services, down from 50%+ three or so years ago. However, there’s no assurance that professional services revenues have been growing much, and hence this doesn’t tell us much about license fee growth besides the obvious point that it’s probably 20%+.
Bottom line: The text mining market has roughly $50-100 million annual product revenue, and is growing at roughly 40-60% annually. If those numbers aren’t accurate, they’re close enough for most purposes that you’d need market statistics for. And please don’t ask me to show the work on which those numbers are based.
And here’s what I said last year:
I vigorously resist estimating market sizes, due to mutliple levels of definitional problems — what products are in the market, which revenue dollars should be associated with which product, etc. But I’ve been talking so much about text mining recently, in the aftermath of an excellent text mining conference, that questions on the subject keep getting posed to me. So here are a few thoughts and data points.
- I estimate that SPSS and SAS have several hundred customers each for text data mining, narrowly construed.
- In addition, SPSS has many hundreds more customers for text mining as specifically applied to opinion surveys, and a bunch more text mining customers that don’t fit neatly into either of the first two groups I cited. Based on this, they have a compelling claim to be the text mining market leader.
- As a wild guess, I estimate that Oracle has in the dozens of text mining customers total, not counting text mining done by other vendors against data in Oracle databases.
- The leaders among the specialist text mining vendors seem to have a few dozen customers each. Inxight is a special case exception because they OEM technology to lots of other search and text mining vendors, including SAS.
- As noted in the post linked above, medical-discovery text mining is around a $10 million market, which isn’t a lot given the large amount of smart and important work being done in the area.
I think these numbers will get a lot bigger soon. Text mining is a very hot area.
Comments
One of the reasons we find so few market sizing figures for text mining — or text analytic technologies is because its hard to “draw the line” around this field.
I’ve seen text analytical tools lumped in with
• the “content management – Information Management” software technologies
• others see Text analytics as one form of analysis to add to predictive analytics and data mining suites
• others focus on the linguistics and semantics
• Others toss it in as a BI enhancement and insist on reporting TM in with the entire BI area as one metric.
• Perhaps we can call TM a form of Artificial Intelligence – you can see evidence of this as text applications are showing up in ACM conferences and research institutions around the world
so before we lament about the lack of revenue numbers for this emerging field – lets begin by building awareness especially in the IT communities of what Text Mining is and start to highlight what is and is not text mining.
Setting boundaries is especially tricky when you try to determine the fair share of revenue coming from “turbo charged” solutions. By this I mean those specific industrial focused implementations such as warranty that are growing by tremendous leaps and bounds now that it has added text analytics to the” Engine.”
Good points all, Mary. Although the definitional wars for “text mining” may be more trouble than they’re worth.
Maybe we should collect some proof points, like “so-and-so many customers for flavor X of app, and so-and-so many for flavor Y.” Actually, I suspect there are a number of subcategories in which SAS is the actual leader, exceeding even SPSS.
Curt Monash