Search engines
Analysis of search technology, products, services, and vendors. Related subjects include:
For search, extreme network neutrality must not be compromised
In a recent post on the Monash Report, I drew a distinction between two aspects of the Internet:Jeffersonet and Edisonet.Jeffersonet deals in thoughts and ideas and research and scholarship and news and politics, and in commerce too.It’s what makes people so passionate about the Internet’s democracy-enhancing nature.It’s what needs to be protected by extreme network neutrality.And it’s modest enough in its bandwidth requirements that net neutrality is completely workable.(Edisonet, by way of contrast, comprises advanced applications in entertainment, teleconferencing, etc. that probably do require new capital investment and tiered pricing schemes.)
And if there’s one application that’s at the core of Jeffersonet, it’s search.No matter how much scary posturing telecom CEOs do – and no matter how profitable or monopolistic Google becomes – telecom carriers must never be allowed to show any preference among search engines!At least, that’s the case for text-centric search engines such as Google, Yahoo, and Microsoft run today.The reason is simple:The democratic part of the Internet only works so long as things can be found.And search will long be a huge part of how to find them.So search engine vendors must never be able to succeed based on a combination of good-enough results plus superior marketing and business development.They always have to be kept afraid of competition from engines that provide better actual search engine results. Read more
Orlowski is back to his old tricks
Andrew Orlowski thinks he’s figured out the Apple/Google/Oracle partnership. But he has it all wrong.
Categories: Enterprise search, Google, Humor, Search engines | Leave a Comment |
SAP’s “search” strategy isn’t about search
I caught up with Dennis Moore today to talk about SAP’s search strategy. And the biggest thing I learned was – it’s not about the search. Rather, it’s about a general interface, of which search and natural language just happen to be major parts.
Dennis didn’t actually give me a lot of details, at least not ones he’s eager to see published at this time. That said, SAP has long had a bare-bones search engine TREX. (TREX was also adapted to create the columnar relational data manager BI Accelerator.) But we didn’t talk about TREX enhancements at all, and I’m guessing there haven’t really been many. Rather, SAP’s focus seems to be on:
A. Finding business objects.
B. Helping users do things with them.
Categories: BI integration, Enterprise search, Language recognition, Natural language processing (NLP), SAP, Search engines | 2 Comments |
Has Google hit 10 petabytes yet?
I’ve been musing about how big Google’s core database might be. Figuring that out is not a trivial problem, unless they’ve published the answer somewhere that I’m not aware of. But here’s a big clue, from an announcement about their n-gram data:
We processed 1,024,908,267,229 words of running text
Categories: Google, Search engines | Leave a Comment |
InQuira’s and Mercado’s approaches to structured search
InQuira and Mercado both have broadened their marketing pitches beyond their traditional specialties of structured search for e-commerce. Even so, it’s well worth talking about those search technologies, which offer features and precision that you just don’t get from generic search engines. There’s a lot going on in these rather cool products.
In broad outline, Mercado and InQuira each combine three basic search approaches:
- Generic text indexing.
- Augmentation via an ontology.
- A rules engine that helps the site owner determine which results and responses are shown under various circumstances.
Of the two, InQuira seems to have the more sophisticated ontology. Indeed, the not-wholly-absurd claim is that InQuira does natural-language processing (NLP). Both vendors incorporate user information in deciding which search results to show, in ways that may be harbingers of what generic search engines like Google and Yahoo will do down the road. Read more
Categories: InQuira, Mercado, Natural language processing (NLP), Ontologies, Search engines, Structured search | 2 Comments |
Does anybody actually use Technorati?
I just did some Technorati searches, and my blog posts come up near the top of the search results for a bunch of small companies’ names and similar words — Attensity, ClearForest, Netezza, DATAllegro, Crossbeam, DMOZ, ODP, and surely many others.
But judging by my referrer logs, nobody cares. I get lots of visitors via classic search engines — largely Google, but also the others — but bubkus from Technorati.
Technorati Tags: Technorati
Categories: Search engines, Specialized search | 4 Comments |
Can Hakia hack it?
Hakia purports to be a new search engine that indexes “semantically,” which I presume means on phrases or concepts or something. But I’ve run a few queries side by side on Hakia and Google, and they’re not doing well. I think they’re not making sufficiently good use of page reputation. Try “web hosting forum” for an example of this, looking at the top two hits in both cases.
When I queried on “Viagra,” Hakia did — as it were — outperform Google. But that’s the only case I, uh, came up with. On less snigger-worthy searches, Google seemed to do as well as or better than Hakia.
Categories: Google, Search engines | Comments Off on Can Hakia hack it? |
What’s interesting about the FAST venture in BI
FAST is annoying me a bit these days. It’s nothing serious, but travel schedule screw-up’s, an annoying embargo, and a screw-up in the annoying embargo have all hit at once. So I’ll keep this telegraphic and move on to other subjects.
- They’re doing fast queries without using a lot of RAM.
- They’re doing the usual text search thing of indexing across multiple “databases,” only now it’s applied to, well, databases. (Not that there’s much new about that particular aspect. Actually, there seems to be a bit of kludge in that they export the databases to some kind of simple text files.)
- They’re doing some level of concept identification ala the text mining guys. (They don’t call it “entity extraction” because the results aren’t dumped into a database anywhere, but instead are just used on the fly.) Of course, the text mining/search convergence goes both ways.
- They bought a BI/dashboard tool and are using it both to analyze query logs and also to do normal BI/dashboard kinds of things.
- They have big references for this stuff, at least the single-web-site query aspect. Well, actually, the customer names are confidential. Oh well.
And as another example of how this wasn’t the smoothest PR month for FAST, Steve Arnold somehow got the false idea that they were getting out of true text search altogether.
Categories: BI integration, Enterprise search, FAST, Search engines, Text mining | 3 Comments |
Twist our arm, please!
Slashdot has a long, exclusive article on proposed US legislation to fight foreign internet censorship. The gist is that companies such as Yahoo and Google seem to be saying “Please, pass a law OBLIGATING us to resist censorship and other bad behavior.”
I think this is both admirable-if-true and, better yet, probably true. Clearly, US web search companies are vulnerable in theory to competition from less scrupulous competitors in other nations. But for now our search technology lead is strong enough that their main competition is with each other. If China (for example) can’t play one of them off against the other, there’s at least it chance it will be reluctant to throw the whole lot of them out.
Categories: Categorization and filtering, Censorship, Google, Search engines | Leave a Comment |
Government-specific search fails to impress
According to Steven Arnold, FirstGov – which has been renamed USASearch.gov — is by far the most effective US government-specific search engine. But there’s something odd about it; whatever the query, it’s determined to give no more than a little over 100 results. Queries for which I’ve noted results in this quantity range include Bush (and this covers all family members), Cheney (ditto), Kennedy (ditto), Condaleeza, Scalia, Coolidge, Red Sox, big dig, Burlingame, Redmond, Pluto, ethanol, spotted owl, and topology. The only ones I’ve found so far coming out above that results range – perhaps inevitably 😉 — are death (137) and taxes (177). Read more
Categories: Convera, Search engines, Specialized search | Leave a Comment |