The text technologies market 3: Here’s what’s missing
The text technologies market should be booming, but actually is in disarray. How, then, do I think it should be fixed? I think the key problem can be summed up like this:
There’s a product category that is a key component of the technology, without which it won’t live up to nearly its potential benefits. But there’s widespread and justified concern over its commercial viability. Hence, the industry cowers in niches where it can indeed eke out some success despite products that fall far short of their true potential.
The product category I have in mind, for lack of a better name, is an ontology management system. No category of text technology can work really well without some kind of semantic understanding. Automated clustering is very important for informing this understanding in a cost-effective way, but such clustering is not a complete solution – hence the relative disappointment of Autonomy, the utter failure of Excite, and so on. Rather, there has to be some kind of concept ontology that can be use to inform disambiguation. It doesn’t matter whether the application category is search, text mining, command/control, or anything else; semantic disambiguation is almost always necessary for the most precise, user-satisfying results. Maybe it’s enough to have a thesaurus – i.e., a list of synonyms. Maybe it’s enough to define “concepts” by simple vectors of word likelihoods. But you have to have something, or your search results will be cluttered, your information retrieval won’t fetch what you want it to, your text mining will have wide error bars, and your free-speech understanders will come back with a whole lot of “I’m sorry; I didn’t understand that.”
This isn’t just my opinion. Look at Inquira. Look at text mining products from SPSS and many others. Look at Oracle’s original text indexing technology and also at its Triplehop acquisition. For that matter, look at Sybase’s AnswersAnywhere, in which the concept network is really just an object model, in the full running-application sense of “object.” Comparing text to some sort of thesaurus or concept representation is central to enterprise text technology applications (and increasingly to web search as well).
Could one “ontology management system,” whatever that is, service multiple types of text applications? Of course it could. The ideal ontology would consist mainly of four aspects:
1. A conceptual part that’s language-independent.
2. A general language-dependent part.
3. A sensitivity to different kinds of text – language is used differently when spoken, for instance, than it is in edited newspaper articles.
4. An enterprise-specific part. For example, a company has product names, and competitors with product names, and those names have abbreviations, and so on.
Relatively little of that is application-specific; for any given enterprise, a single ontology should meet most or all of its application needs.
Coming up: The legitimate barriers to the creation of an ontology management system market, and ideas about how to overcome them.
Comments
5 Responses to “The text technologies market 3: Here’s what’s missing”
Leave a Reply
[…] Incidentally, the whole TREX strategy is subject to considerable doubt too. It’s not a state-of-the-art product, and they currently don’t plan to make it into one. In particular, they have a prejudice against semi-automated ontology creation, and that has clearly become a requirement for top-tier text technologies. […]
[…] In previous posts I argued that what’s holding the text technology industry back is the lack of a viable ontology management system. The obvious objection to such a suggestion is: Who would use it? There is no business process for ontology management, even less than there is for “knowledge management,” and for that matter less than there was for “knowledge engineering” during the expert systems bubble of the1980s. Enterprises do not have anything like a “chief ontologist.” Indeed, that job title sounds like a joke — a touchy-feely liberal-artsy nonstarter. […]
[…] Over on the Text Technologies blog, I have a series of posts arguing that the potentially huge market for enterprise text technologies is being stifled by the lack of a general-purpose ontology management system. I further argue that such a product could be constructed in such a way as to be actually usable and potentially adopted by mainstream enterprises (no, you don’t need a trained librarian to use it). So what are the chances of something like this actually working out, to an industry-changing extent? […]
[…] I’ve argued previously that enterprises need serious ontologies, and that this lack is holding back growth in multiple areas of text technology – search, text mining and knowledge extraction, various forms of speech recognition, and so on. The core point was: The ideal ontology would consist mainly of four aspects: […]
[…] Text mining and search are powered by the same underlying technologies. For starters, there’s all the tokenization, extraction, etc. that vendors in both areas license from Inxight and its competitors. Beyond that, I think there’s a future play in integrated taxonomy management that will rearrange the text analytics market landscape. […]