June 4, 2008

Clarabridge is now all about text mining SaaS

Clarabridge CEO Sid Banerjee called with some product news that is embargoed until the Text Analytics Summit, and which I hence won’t write about at this time. But during the call, I discovered something interesting – Clarabridge’s hosted/SaaS (Software as a Service) text mining offering has taken over its business. Highlights of the call included: Read more

May 29, 2008

Google is idiosyncratic about what it displays

I was testing the new blog theme installed on Software Memories, specifically to see whether the title and description in the search engine results reflected the metatag title and description I’d just put in, which are

History of the software industry, its companies and its personalities

and

History of the software industry by Curt Monash, who’s been in the middle of it since 1981

respectively.

Well, the answer turns out to be a resounding “Yes and no.” Read more

May 27, 2008

Sneak preview of our blog redesign

Our new blog theme is finally working! You can see it over on Software Memories. We plan to have something very similar soon on our other blogs (but each in its own color). If you want to have any influence on our look and feel — or if you just want to help me out — now would be a really good time to take a look and see if you have any comments.

Salient features of the new design include:

May 19, 2008

How is YouTube relating videos?

One of the great music videos of all time is Madonna’s Material Girl. With two exceptions, all the “related videos” listed by YouTube are just what one would expect: either other Madonna videos, or other versions of Material Girl. One exception is Cyndi Lauper’s Girls Just Want to Have Fun, while the other is Marilyn Monroe’s Diamonds Are A Girl’s Best Friend. The connection with the Monroe video is particularly strong, with each being #3 on each other’s “Related” list.

And that’s an outstanding result. Material Girl is obviously a direct reference, conceptually and visually, to Diamonds Are A Girl’s Best Friend. So my question is: How does YouTube know that? Are there favorite videos lists on which they co-exist? Did somebody hand-enter the connection? Is it inferred from their comment threads (which I definitely have not paged through)? Or — by far the least likely but most interesting of all — is there some sort of direct visual comparison?

Other than popularity presumably having something to do with it (both videos are, deservedly, very often watched and commented on), I haven’t figured out which it is.

May 12, 2008

Powerset is mildly interesting

Powerset has done a great job of generating buzz for it’s version of smart search. That said, its current demo is mediocre — and that’s being polite. Powerset currently indexes little more than just Wikipedia, and the quality of its search results is about comparable to that of Wikipedia’s justly reviled internal search engine. To determine this, I did searches on both sites on five strings. Wikipedia typically had more total junk ranking higher, but it also put the very best hits of all higher than Powerset did. The strings were:

Read more

May 8, 2008

Text Analytics Summit and associated Seth Grimes white paper

Ironically coming right after a Google indexing problem, I am putting up my first sponsored blog post ever. It’s in connection with the forthcoming Text Analytics Summit, at which I will be speaking (in Boston) on June 16. The post itself offers a free white paper by the estimable Seth Grimes.

Read more

May 8, 2008

Google seems to have rehabilitated us

As previously noted, we were de-indexed by Google, due to the injection of a whole lot of spammy hidden links. We’re back now, after about two weeks, even on the blog (this one) where there was no official de-indexing notice and hence no way to apply for re-consideration. And thus we once again have high rankings for search terms such as Netezza, DATAllegro, Clarabridge, and Attivio.

We’re designing a new blog theme — the current one is just an emergency stopgap — that will (among myriad more important virtues) be more SEO-friendly. I’ll be curious to see whether that makes much actual difference from a search ranking standpoint.

April 29, 2008

Mark Logic viewed as a different kind of text search technology vendor

I’m putting up two posts this morning on Mark Logic and its MarkLogic product family. The main one, over on DBMS2, outlines the technical architecture — focusing on MarkLogic as an XML database management system — and provides a bit of overall context. This post attempts to position MarkLogic against alternative kinds of text analytics engine.

For the most part, MarkLogic is indeed sold (and bought) for the storage, manipulation, and retrieval of text. (One long-confidential exception to this rule is scheduled to be unveiled at the June user conference.) Most applications seem to fit a custom publishing/enhanced search paradigm:

  1. Ingest text.

  2. Enhance it.

  3. Serve it up in chunks, typically via a sophisticated search interface.

Differences vs. conventional search engines include:

Mark Logic also claims huge advantages in corpus administration. Scalability seems good too; there’s a national-intelligence customer with a 200 terabyte database. And they’re proud of a feature called lexicons, although it seems so obvious to me that I’ve so far failed to muster what they’d probably regard as the proper level of excitement about it. (In SQL terms, it seems to be a combination of SELECT and COUNT DISTINCT, both of which are capabilities I’d think would be in XQuery anyway.)

April 25, 2008

Twitter is indeed replaceable

Dennis Howlett believes any hope of monetizing [Twitter] rests upon reliability at scale. He’s partially right. Michael Arrington disagrees, essentially asserting that Twitter has become an unshakable monopoly due to the network effect, but his reasoning is flawed. Read more

April 25, 2008

Investment text mining job listing

As per this job listing, at least one “major NYC investment bank” plans to do text mining on a proprietary trading desk.

The successful candidate will mine text data from numerous news sources and incorporate the information the proprietary trading systems.

← Previous PageNext Page →

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.