Application areas
Posts focusing on the use of text analytics technologies in specific application domains. Related subjects include:
- Any subcategory
- (in DBMS2) Specific application areas for other analytic and database technologies
When just-in-time electronic documentation is a really good idea
Mark Logic basically makes an XML DBMS – confusingly called Marklogic without a space – optimized for document processing (including text search). Mark Logic’s main market is custom publishing – assembling documents on the fly, whether based on search or some other starting point.
Airlines put Marklogic to an interesting use: They create “electronic flight bags.” Apparently, flight crews typically carry a whole satchel of documents (flight bags) onto a plane, the precise contents of which frequently vary. Marklogic lets these be automatically generated in electronic form.
Well, in recent news it turns out that a $1.4 billion B-1 bomber crashed because a known prudent take-off/maintenance procedure hadn’t been followed. (Something about heating the components to evaporate water that otherwise destroyed the electronics.) This plane-saving had been discovered, but not propagated to all bases and maintenance crews responsible for the B-1. You think something like Marklogic might have helped? Read more
Categories: Application areas, Custom publishing, Mark Logic | 2 Comments |
Clarabridge’s customer-experience applications
I talked with text mining SaaS vendor Clarabridge’s CEO Sid Banerjee today. Part of the call covered applications and markets for Clarabridge’s technology. Highlights included: Read more
Categories: Application areas, Clarabridge, Software as a Service (SaaS), Text mining, Text mining SaaS, Voice of the Customer | Leave a Comment |
Mark Logic viewed as a different kind of text search technology vendor
I’m putting up two posts this morning on Mark Logic and its MarkLogic product family. The main one, over on DBMS2, outlines the technical architecture — focusing on MarkLogic as an XML database management system — and provides a bit of overall context. This post attempts to position MarkLogic against alternative kinds of text analytics engine.
For the most part, MarkLogic is indeed sold (and bought) for the storage, manipulation, and retrieval of text. (One long-confidential exception to this rule is scheduled to be unveiled at the June user conference.) Most applications seem to fit a custom publishing/enhanced search paradigm:
-
Ingest text.
-
Enhance it.
-
Serve it up in chunks, typically via a sophisticated search interface.
Differences vs. conventional search engines include:
-
Documents are indexed on the fly, and available for query immediately upon ingestion.
-
MarkLogic is a real, ACID-compliant DBMS. So everything else – such as a user tag or comment — is also available for immediate query. Mark Logic says customers are making a lot of use of this feature.
-
MarkLogic has a real programming language – specifically XQuery. (Note: XQuery is a much fuller language than, say, standard SQL, with conditional logic, arithmetic, try/catch, and so on.)
-
MarkLogic handles fielded information, document chunks, and whole documents in a completely integrated fashion. Truth be told, I don’t know exactly to what extent Autonomy or FAST do or don’t fall short of this standard, but it’s never seemed to be as much of a priority on their part as I’ve felt it should be.
Mark Logic also claims huge advantages in corpus administration. Scalability seems good too; there’s a national-intelligence customer with a 200 terabyte database. And they’re proud of a feature called lexicons, although it seems so obvious to me that I’ve so far failed to muster what they’d probably regard as the proper level of excitement about it. (In SQL terms, it seems to be a combination of SELECT and COUNT DISTINCT, both of which are capabilities I’d think would be in XQuery anyway.)
Categories: Application areas, Custom publishing, Mark Logic | 4 Comments |
Investment text mining job listing
As per this job listing, at least one “major NYC investment bank” plans to do text mining on a proprietary trading desk.
The successful candidate will mine text data from numerous news sources and incorporate the information the proprietary trading systems.
The biggest text analytics company you probably never heard of
I caught up with Expert System S.p.A. last week. They turn out to be doing $10 million in text technology annual revenue. That alone is surprising (sadly), but what’s really remarkable is that they did it almost entirely in the Italian market. As you might guess, that figure includes a little bit of everything, from search engines to Italian language filters for Microsoft Office to text mining. But only $3 ½ million of Expert System’s revenue is from the government (and I think that includes civilian agencies), and under 30% is professional services, so on the whole it seems like a pretty real accomplishment. Oh yes – Expert Systems says it’s entirely self-funded.
As of last year, Expert System also has English-language products, and a couple of minor OEM sales in the US (for mobile search and semantic web applications). German- and Arabic-language products are in beta test. The company says that its market focus going forward is national security – surely the reason for the Arabic – and competitive intelligence. It envisions selling through partners such as system integrators, although I think that makes more sense for the government market than it does vis-a-vis civilian companies. In February the company is introducing a market intelligence product focused on sentiment analysis.
Expert System is a bit of a throwback, in that it talks lovingly of the semantic network that informs its products. Read more
Categories: Application areas, Competitive intelligence, Enterprise search, Expert System S.p.A., Ontologies, Search engines, Text mining | Leave a Comment |
A claim that Google is doing pretty detailed extraction
In a blog post focusing on SEOing for local search, some interesting claims are argued, including:
- Google knows what a review is. (This seems to be “everybody knows it” conventional wisdom.)
- Google knows how many stars a review got. (Ditto.)
- Google tracks who the reviewer is and how many other reviews s/he wrote (that’s the big insight of the post and related conversation).
Pretty interesting. Text mining companies are paying a lot of attention to Voice-of-the-Market these days; even so, I question whether then can do the same things out of the box.
Categories: Competitive intelligence, Google, Search engines | 1 Comment |
Scout Labs – yet more public-facing sentiment analysis
Scout Labs sounds like even more of what I was thinking of than Summize. It’s a shame that the “traditional” text mining vendors didn’t get there first.
Categories: Competitive intelligence, Text mining | 2 Comments |
The text mining vendors continue to lack constructive vision
I’ve been thinking for a long time that the various text mining companies doing sentiment analysis should try some public-facing (or at least multi-customer) services. Investors might love such a thing. So might marketing managers (actually, Factiva claims to be active there, at least as per their web site). And as a key part of the strategy, text mining companies selling to enterprises might brand such a site and gain massive awareness accordingly. Well, it seems that public-facing sentiment analysis sites are springing up. At least, Summize has. (Hat tip to TechCrunch.) And the text mining vendors are nowhere to be seen.
So what else is new? Read more
Categories: Application areas, Factiva/Dow Jones, Investment research and trading, Text mining | 1 Comment |
Attivio tries to do it all
When Andrew McKay was at FAST, I grumped about his search/BI integration story. Now that he’s trying to do the same thing at a startup called Attivio, it sounds more plausible.
Attivio is having a house party and product rollout in the latter part of January, and details are scarce in the mean time. But here are some highlights.
- Attivio was founded in August. It has 21 people and 1 VC. The VC has invested >$6 million and committed >$12 million total.
- Attivio has ambitious plans for a fully integrated data management/real-time BI stack. It’s currently called the “Active Intelligence Engine.” Read more
Categories: Attivio, BI integration, Investment research and trading, Lucene, Open source text analytics | 4 Comments |
QL2 – web text extraction and more
Here are some highlights of the QL2 story, per exec Mike McDermott.
- QL2’s main business is scraping price and other product offering data from the web for high-speed competitive analysis. For example, of their 250ish customers overall, over 90 are airlines. Online retailers are another big chunk of their customer base.
- QL2 also commonly partners with text mining companies in applications such as Voice of the Market or competitive intelligence. E.g., QL2 has been brought into a few deals each by Attensity, Clarabridge, and especially Temis.
- QL2 goes well beyond basic crawling. Notably, the system fills in forms with parameters. And of course it monitors pages for changes.
- QL2’s scripting language is, Mike tells me, very SQL-like. Hence the “QL” in the name.
- QL2 rolls its own filters, rather than using INSO or whoever. (Actually, what are the main file-reading filter choices these days? I’ve lost track.) Indeed, Mike fondly believes QL2 does a better job with PDFs than Adobe does.
- QL2 doesn’t want to be thought of as web-only. Rather, Mike likes my formulation of “text data ETL, web or otherwise.” That said, he freely admits QL2’s strength is in Extract rather than in Transform or Load.
Categories: Application areas, Competitive intelligence, QL2, Text mining | Leave a Comment |