Is text analytics a good technology career path for humanities majors?
One of the major dilemmas facing a group of people we all know is: How can humanities majors make money? Sure, they can become lawyers. And they can join the tech industry and write documentation. But what else?
Well, what about text analytics? Much of what I know about natural language processing (NLP) I learned from my friend Sharon Flank, who I met when she was a Slavic Linguistics PhD student at Harvard. My partner in first figuring out search engines — and later in running Elucidate — was my wife Linda Barlow, a 15-times-published novelist who’s also taught English at the college level. And Olivier Jouve’s education is in paleontology, although whether or not that’s a humanity is a sort of borderline definitional issue.
So I ask you all: Is text analytics a fruitful area for humanities majors to find lucrative careers? All insight would be appreciated. If the news is good enough, I’ll do my part in publicizing it to university placement offices and the like. Read more
Categories: Attensity, Clarabridge, Jobs and careers | 5 Comments |
Attensity update
I chatted recently with David Bean, Attensity’s CTO, and then with marketing exec Phil Talsky. Highlights included: Read more
Categories: Application areas, Attensity, Clarabridge, Competitive intelligence, Software as a Service (SaaS), Text mining, Text mining SaaS, Voice of the Customer | 6 Comments |
When just-in-time electronic documentation is a really good idea
Mark Logic basically makes an XML DBMS – confusingly called Marklogic without a space – optimized for document processing (including text search). Mark Logic’s main market is custom publishing – assembling documents on the fly, whether based on search or some other starting point.
Airlines put Marklogic to an interesting use: They create “electronic flight bags.” Apparently, flight crews typically carry a whole satchel of documents (flight bags) onto a plane, the precise contents of which frequently vary. Marklogic lets these be automatically generated in electronic form.
Well, in recent news it turns out that a $1.4 billion B-1 bomber crashed because a known prudent take-off/maintenance procedure hadn’t been followed. (Something about heating the components to evaporate water that otherwise destroyed the electronics.) This plane-saving had been discovered, but not propagated to all bases and maintenance crews responsible for the B-1. You think something like Marklogic might have helped? Read more
Categories: Application areas, Custom publishing, Mark Logic | 2 Comments |
Clarabridge’s customer-experience applications
I talked with text mining SaaS vendor Clarabridge’s CEO Sid Banerjee today. Part of the call covered applications and markets for Clarabridge’s technology. Highlights included: Read more
Categories: Application areas, Clarabridge, Software as a Service (SaaS), Text mining, Text mining SaaS, Voice of the Customer | Leave a Comment |
Clarabridge is now all about text mining SaaS
Clarabridge CEO Sid Banerjee called with some product news that is embargoed until the Text Analytics Summit, and which I hence won’t write about at this time. But during the call, I discovered something interesting – Clarabridge’s hosted/SaaS (Software as a Service) text mining offering has taken over its business. Highlights of the call included: Read more
Categories: Clarabridge, Software as a Service (SaaS), Text mining, Text mining SaaS | 2 Comments |
Google is idiosyncratic about what it displays
I was testing the new blog theme installed on Software Memories, specifically to see whether the title and description in the search engine results reflected the metatag title and description I’d just put in, which are
History of the software industry, its companies and its personalities
and
History of the software industry by Curt Monash, who’s been in the middle of it since 1981
respectively.
Well, the answer turns out to be a resounding “Yes and no.” Read more
Categories: Google, Search engine optimization (SEO), Search engines | Leave a Comment |
How is YouTube relating videos?
One of the great music videos of all time is Madonna’s Material Girl. With two exceptions, all the “related videos” listed by YouTube are just what one would expect: either other Madonna videos, or other versions of Material Girl. One exception is Cyndi Lauper’s Girls Just Want to Have Fun, while the other is Marilyn Monroe’s Diamonds Are A Girl’s Best Friend. The connection with the Monroe video is particularly strong, with each being #3 on each other’s “Related” list.
And that’s an outstanding result. Material Girl is obviously a direct reference, conceptually and visually, to Diamonds Are A Girl’s Best Friend. So my question is: How does YouTube know that? Are there favorite videos lists on which they co-exist? Did somebody hand-enter the connection? Is it inferred from their comment threads (which I definitely have not paged through)? Or — by far the least likely but most interesting of all — is there some sort of direct visual comparison?
Other than popularity presumably having something to do with it (both videos are, deservedly, very often watched and commented on), I haven’t figured out which it is.
Categories: Audio and video search, Google, Search engines | 4 Comments |
Powerset is mildly interesting
Powerset has done a great job of generating buzz for it’s version of smart search. That said, its current demo is mediocre — and that’s being polite. Powerset currently indexes little more than just Wikipedia, and the quality of its search results is about comparable to that of Wikipedia’s justly reviled internal search engine. To determine this, I did searches on both sites on five strings. Wikipedia typically had more total junk ranking higher, but it also put the very best hits of all higher than Powerset did. The strings were:
- Drosophila research
- Bill Clinton foreign policy
- Home run hitters
- Innocents on death row
- Text data mining
Categories: Powerset, Search engines, Structured search | 4 Comments |
Google seems to have rehabilitated us
As previously noted, we were de-indexed by Google, due to the injection of a whole lot of spammy hidden links. We’re back now, after about two weeks, even on the blog (this one) where there was no official de-indexing notice and hence no way to apply for re-consideration. And thus we once again have high rankings for search terms such as Netezza, DATAllegro, Clarabridge, and Attivio.
We’re designing a new blog theme — the current one is just an emergency stopgap — that will (among myriad more important virtues) be more SEO-friendly. I’ll be curious to see whether that makes much actual difference from a search ranking standpoint.
Categories: Google, Search engine optimization (SEO), Spam and antispam | 1 Comment |
Mark Logic viewed as a different kind of text search technology vendor
I’m putting up two posts this morning on Mark Logic and its MarkLogic product family. The main one, over on DBMS2, outlines the technical architecture — focusing on MarkLogic as an XML database management system — and provides a bit of overall context. This post attempts to position MarkLogic against alternative kinds of text analytics engine.
For the most part, MarkLogic is indeed sold (and bought) for the storage, manipulation, and retrieval of text. (One long-confidential exception to this rule is scheduled to be unveiled at the June user conference.) Most applications seem to fit a custom publishing/enhanced search paradigm:
-
Ingest text.
-
Enhance it.
-
Serve it up in chunks, typically via a sophisticated search interface.
Differences vs. conventional search engines include:
-
Documents are indexed on the fly, and available for query immediately upon ingestion.
-
MarkLogic is a real, ACID-compliant DBMS. So everything else – such as a user tag or comment — is also available for immediate query. Mark Logic says customers are making a lot of use of this feature.
-
MarkLogic has a real programming language – specifically XQuery. (Note: XQuery is a much fuller language than, say, standard SQL, with conditional logic, arithmetic, try/catch, and so on.)
-
MarkLogic handles fielded information, document chunks, and whole documents in a completely integrated fashion. Truth be told, I don’t know exactly to what extent Autonomy or FAST do or don’t fall short of this standard, but it’s never seemed to be as much of a priority on their part as I’ve felt it should be.
Mark Logic also claims huge advantages in corpus administration. Scalability seems good too; there’s a national-intelligence customer with a 200 terabyte database. And they’re proud of a feature called lexicons, although it seems so obvious to me that I’ve so far failed to muster what they’d probably regard as the proper level of excitement about it. (In SQL terms, it seems to be a combination of SELECT and COUNT DISTINCT, both of which are capabilities I’d think would be in XQuery anyway.)
Categories: Application areas, Custom publishing, Mark Logic | 4 Comments |