January 31, 2007

Twist our arm, please!

Slashdot has a long, exclusive article on proposed US legislation to fight foreign internet censorship. The gist is that companies such as Yahoo and Google seem to be saying “Please, pass a law OBLIGATING us to resist censorship and other bad behavior.”

I think this is both admirable-if-true and, better yet, probably true. Clearly, US web search companies are vulnerable in theory to competition from less scrupulous competitors in other nations. But for now our search technology lead is strong enough that their main competition is with each other. If China (for example) can’t play one of them off against the other, there’s at least it chance it will be reluctant to throw the whole lot of them out.

January 31, 2007

Government-specific search fails to impress

According to Steven Arnold, FirstGov – which has been renamed USASearch.gov — is by far the most effective US government-specific search engine. But there’s something odd about it; whatever the query, it’s determined to give no more than a little over 100 results. Queries for which I’ve noted results in this quantity range include Bush (and this covers all family members), Cheney (ditto), Kennedy (ditto), Condaleeza, Scalia, Coolidge, Red Sox, big dig, Burlingame, Redmond, Pluto, ethanol, spotted owl, and topology. The only ones I’ve found so far coming out above that results range – perhaps inevitably 😉 — are death (137) and taxes (177). Read more

January 30, 2007

Worst URL pun ever

http://radar.oreilly.com/

That’s baaaad. Although I think Linda would insist it’s “wordplay” rather than a “pun.”

I wonder whether they have any posts about M*A*S*Hups.

January 30, 2007

A great new (to me) phrase – “Adversarial Information Retrieval”

I’ve just discovered a great new phrase – adversarial information retrieval. It’s not really new, since papers are now being accepted for what will be the third annual conference on the subject. But it seems to have gained currency over the past few months.

Edit: The term seems to have been coined in 2000.

I think this area is really where the bulk of the research into public search engine algorithms goes. And that’s another way of saying that web and enterprise search are very different things.

January 30, 2007

The Chinese censorship threat continues to ratchet up

Ted Samsen of Infoworld is worried that the Chinese are attempting to ratchet up internet censorship yet further. Welcome to the club, buddy. This problem is a big one, and I don’t think it’s going to be addressed without vigorous action. I particular, I suspect that what is needed may be some major efforts in white-hat spamming. Lance Cottrell of Anonymizer has clever ideas along those lines for fighting censorship in the short term, but I think a bigger effort is needed as well.

Google, by the way, is caught in a tough spot and knows it.

January 26, 2007

FAST said to be pursuing BI

Dave Kellogg thinks FAST will be ineffective and defocused because of its efforts in business intelligence. I can’t comment on whether that analysis is brilliant, self-serving, or both, because anything I’ve been told on the subject is under embargo.

Embargos were a crucial PR tactic when Regis McKenna exploited them for the original rollout of the Macintosh in 1984. But I suspect that in many cases they’ve quite outlived their usefulness. If I wait between the time I’m briefed and the time the embargo is up to write something, my thoughts about it get fuzzy. If I write something at the time and put it on ice, it may be obsolete because of what other people write in the mean time.

More and more, if something is embargoed, I wind up not writing about it at all.

EDIT: Point #4 of my post on the mismatch between relational databases and text search is pretty relevant here.

January 23, 2007

But Google trumps most site search

Popular on Digg, for obvious reasons, is a post showing that Google is better for searching Digg than Digg’s own search engine. No shock there. If I want to search Wikipedia for information on astrowidgets, I’ll just google on the phrase wikipedia astrowidgets. That works much better than Wikipedia’s own search.

Speaking of which — if you want to search for my writing, I’m using Google web search technology too. It works like a charm.

January 22, 2007

41 differences between web and enterprise search

Based on a patent application, SEOmoz has discerned 65 aspects of the Google ranking algorithm.* I counted only 24 that really had much at all to do with enterprise search. This leaves 41 or so focused on spam/SEO-fighting and/or on-page linking issues that have no enterprise parallel. And for more depth, here’s a long article from another SEO site, on a specific phrase-concurrence spam-fighting technique that has no apparent applicability to trusted corpuses.
*I highly recommend this link. It is by far the best single-page overview of web search algorithmic issues I’ve ever seen.

I’ve said it before, but it bears repeating — web search and enterprise search (or search of a constrained corpus) are very different technical problems.

January 11, 2007

Text Analytics Summit — a promising idea gone bad

I hope to be proved wrong, but I think the Text Analytics Summits going forward will be a waste of time and money. The model the first two years was the same, with key features being:

Most of the attendees are vendors, and the ones I talk with almost uniformly agree they didn’t learn much, and they didn’t meet many prospects either.

If you’re a user or prospect, you can do as well inviting vendors to make sales pitches to you, and eventually calling references. If you’re a vendor, it’s a great place to socialize with your buddies from other firms, but that’s about it. If you’re press or analyst — well, it’s an easy drive for me in Boston, and I’m really focused on this subject area, so I’ve been attending. But almost no other press or analysts went, and I honestly can’t blame them for staying away.

Could this change? Yes. But in my discussions with the organizers, I’ve gotten zero reason to think that it will.

January 4, 2007

(Crosspost) New ways to read our research!

We’ve finally redesigned the Monash Information Services website. In particular, we’ve created two great new ways to read our research. First, there’s a new, Google-based integrated search engine. (And it really works well, the one glitch being that it brings back feeds and pages interchangeably. Try it out!) Also – and I really encourage you all to subscribe to this — there’s a new integrated research feed.

The reason you should care about these is in both cases the same: Our research is actually spread across multiple sites and feeds. I write about Google both in the Monash Report and on Text Technologies. I write about enterprise text management both on Text Technologies and on DBMS2. I write about computing appliances both on DBMS2 and in the Monash Report. I write about data mining in all three places. And now that there’s an integrated, industry history relevant to any of the other subject areas may find its way onto Software Memories. Your view of my views simply isn’t complete unless you have access to all of those sites.

← Previous PageNext Page →

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.