January 31, 2008

The biggest text analytics company you probably never heard of

I caught up with Expert System S.p.A. last week. They turn out to be doing $10 million in text technology annual revenue. That alone is surprising (sadly), but what’s really remarkable is that they did it almost entirely in the Italian market. As you might guess, that figure includes a little bit of everything, from search engines to Italian language filters for Microsoft Office to text mining. But only $3 ½ million of Expert System’s revenue is from the government (and I think that includes civilian agencies), and under 30% is professional services, so on the whole it seems like a pretty real accomplishment. Oh yes – Expert Systems says it’s entirely self-funded.

As of last year, Expert System also has English-language products, and a couple of minor OEM sales in the US (for mobile search and semantic web applications). German- and Arabic-language products are in beta test. The company says that its market focus going forward is national security – surely the reason for the Arabic – and competitive intelligence. It envisions selling through partners such as system integrators, although I think that makes more sense for the government market than it does vis-a-vis civilian companies. In February the company is introducing a market intelligence product focused on sentiment analysis.

Expert System is a bit of a throwback, in that it talks lovingly of the semantic network that informs its products. Read more

January 28, 2008

Forrester says 2008 is the year of enterprise social software

We all know how “The Year of X” kinds of predictions go. Still, when I read that Forrester Research says enterprises are ready to seriously adopt wikis and message forums, it made sense to me. Email threads — via Notes/Exchange or otherwise — aren’t doing the job any more. It’s time to go straight to communally-created web pages.

Personally, I think it’s also time to further replace email disasters, by having broadcasts over something like an enterprise version of Twitter. Clearly, enterprise Twitter would have to have a lot more tagging, group filtering, and automated censorship — ::sigh:: — than current public Twitter. But that all fits very well into the CEP-based architecture (or some near equivalent) that I believe to be the future of Twitter anyway. So would a complete integration between enterprise Twitter and point-to-point enterprise instant messaging.

January 26, 2008

Anatomy of spam blogs

A post that gives you a clear sense of how gobbledydook is automatically generated (from another knowledgeable black-hat SEO who can’t be bothered to get his permalink structure sensible 😉 )

January 18, 2008

Google is putting more emphasis on phrases

I don’t know how pronounced this trend is, but Google web search seems to be putting more emphasis on phrases than it used to.

For starters, Google doesn’t always ignore stopwords. The Fly and Fly produce different search results. Beyond that, “or” is sometimes assumed to be a word you’re searching on, not an operator — for an example, try live free or die and see the line of text that comes back under the search box. (I’m not sure whether this ever works for “and” as well — even Sanford and Son returns the usual harangue that “the AND operator is unnecessary”.) This is all a pretty clear indicator that Google is looking at phrases. Bill Slawski’s patent-analysis-heavy SEO blog has a lot more to say on that subject, specifically on an indexing scheme that addresses the problems that indexing stopwords in might otherwise cause.

Also, there’s a direct series of patents on “Phrase-Based Indexing.”

Finally, although I don’t recall a link, there seems to be a belief that:

  1. Google is using or moving to Latent Semantic Indexing (LSI)
  2. Word-based LSI is patented by somebody else.
January 17, 2008

Tag cloud for the Iliad

Here are the top 200 tags (words? subjects? themes?) in the Iliad, per IBM Research.

Neither Paris nor Helen makes the list. Either Homer couldn’t stay on topic, or else the ostensible reasons for the war had little to do with the real issues. I say it’s the latter. Plus ça change, plus c’est la même chose.

Evidently one can upload one’s own data there to make one’s own visualizations.

January 17, 2008

Lynda Moulton on enterprise search

Lynda Moulton and I see enterprise search quite similarly, as I discovered when she called me yesterday to praise my post on the many differences between enterprise and web search, and followed up with this one of her own. One of Lynda’s big themes is that large enterprises, much as they use multiple database management systems, use multiple search engines too. Read more

January 17, 2008

Dr. Doolittle in silicon

The Reg passes along a Reuters story that Hungarian scientists have built a system to automatically understand canine vocalizations. I’d like to say it’s a woof-to-Magyar translator, but apparently all it does is recognize the doggies’ emotional states. The story reports that the system has 43% accuracy, vs. 40% for humans.

I must confess, however, to being somewhat puzzled about how they measure success. Does the pooch fill out a survey form afterwards? Do they conclude that the beast wasn’t angry if the experimenter doesn’t get bitten?

I need to know a bit more about the research protocol before I know what to think about this.

EDIT: The CBC has a little more detail. The underlying research paper is appearing in Animal Cognition.

January 16, 2008

Automation secrets of black hat SEO

XMCP writes one of the better black hat SEO blogs. In a post last November, he laid out a ton of advice about automating black hat SEO. Personally, I don’t approve of doing black hat SEO. Still, it’s an intellectually interesting subject. What’s more, black hat SEOs create a large fraction of all websites, and certainly of all blog comments, links, and so on. So it’s interesting to track them.

Most interesting to me and probably to most readers here is the part that shows where black hat SEOs get their content: Read more

January 14, 2008

An interesting Matt Cutts interview from December

Stephen Spencer has a great interview with Matt Cutts of Google, from last month’s Pubcon. Almost all of it is SEO-related. But it also contains a few tidbits that may be interesting even if one doesn’t care about SEO, such as:

SEO highlights included: Read more

January 14, 2008

19 bullet points about the difference between enterprise and web search

Eric Lai wrote in this week’s Computerworld about “Why is enterprise search harder than Google Web search?” Highlights included: Read more

← Previous PageNext Page →

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.