Search engines
Analysis of search technology, products, services, and vendors. Related subjects include:
Why the BI vendors are integrating with Google OneBox
I’m hearing the same thing from multiple BI vendors, with SAS being the most recent and freshest in my mind — customers want them to “integrate” with Google OneBox. Why Google rather than a better enterprise search technology, such as FAST’s? So far as I’ve figured out, these are the reasons, in no particular order:
- Price.
- Ease of installation (real or imagined).
- The familiar Google brand name.
- The familiar Google UI.
- Google OneBox’s ability to search relational records, reports, etc. along with more tradtional record types.
The last point, I think, is the most interesting. Lots of people think text search is and/or should be the dominant UI of the future. Now, I’ve been a big fan of natural language command line interfaces ever since the days of Intellect and Lotus HAL. But judging by the market success of those products — or for that matter of voice command/control — I was in a very small minority. Maybe the even simpler search interface — words jumbled together without grammatical structure — will win out instead.
Who knows? Progress is a funny thing. Maybe the ultimate UI will be one that responds well to grunts, hand gestures, and stick-figure drawings. We could call it NeanderHAL, but that would wrong …
Categories: BI integration, Enterprise search, FAST, Google, Natural language processing (NLP), SAS, Search engines | 1 Comment |
Is text technology mirroring business intelligence?
After even more glitches than usual with their content management system, Computerworld finally posted the second part of my series on enterprise text technology architectures. I already posted the main points of the column here several weeks ago, but of course the column includes further material. In particular, I draw an analogy between text technologies and business intelligence, inspired in part by various direct ties between the two disciplines. Dave Kellogg makes a similar point, focused on general market development.
Just how precisely accurate the analogy winds up being will depend in a large part, I think, on whether search engines (analogous to data warehouses) will wind up being the foundation of text-heavy functionality. The jury is still out on that.
Categories: BI integration, Search engines | Leave a Comment |
Mark Logic and the custom publishing business
I talked again with Mark Logic, makers of MarkLogic Server, and they continue to have an interesting story. Basically, their technology is better search/retrieval through XML. The retrieval part is where their major differentiation lies. Accordingly, their initial market focus (they’re up to 46 customers now, including lots of big names) is on custom publishing. And by the way, they’re a good partner for fact-extraction companies, at least in the case of ClearForest.
Here, as best I understand, is the story of the custom publishing business. Read more
Categories: Application areas, ClearForest/Reuters, Custom publishing, Mark Logic, Search engines, Specialized search | 2 Comments |
Business Objects’ perspective on text mining (and search)
I had a call with Business Objects, mainly about their overall EIM/ETL product line (Enterprise Information Management, a superset of Extract/Transform/Load). But I took the opportunity to ask about their deal with Attensity. (Attensity themselves posted more about the relationship, including some detailed links, here.) It actually sounds pretty real. They also mentioned that there seem to be a bunch of startups proposing search as a substitute for data warehousing, much as FAST sometimes likes to.
Categories: Attensity, BI integration, Search engines, Text mining | 1 Comment |
Principles of enterprise text technology architecture
My August Computerworld column starts where July’s left off, and suggests principles for enterprise text technology architecture. This will not run Monday, August 7, as I was originally led to believe, but rather in my usual second-Monday slot, namely August 14. Thus, I finished it a week earlier than necessary, and I apologize to those of you I inconvenienced with the unnecessary rush to meet that deadline.
The principles I came up with are:
- Deploy search widely across the enterprise.
- It’s OK for your text data to be distributed across a range of silos.
- Integrate fact extraction/text mining aggressively into your predictive analytics and dashboards.
- Having a preferred enterprise text technology tool suite is nice, but accept that there will probably be lots of departmental exceptions.
- Reinvent your customer communication (and other) processes to exploit text technologies.
- Integrate your taxonomies.
I’ll provide a link when the column is actually posted.
Categories: Enterprise search, Ontologies, Search engines, Text mining | 1 Comment |
Introduction to FAST
FAST, aka Fast Search & Transfer (www.fastsearch.com) is a pretty interesting and important company. They have 3500 enterprise customers, a rapidly growing $100 million revenue run rate, and a quarter billion dollars in the bank. Their core business is of course enterprise search, where they boast great scalability, based on a Google-like grid architecture, which they fondly think is actually more efficient than Google’s. Beyond that, they’ve verticalized search, exploiting the modularity of their product line to better serve a variety of niche markets. And they’re active in elementary fact/entity extraction as well. Oh yes – they also have forms of guided navigation, taxonomy-awareness, and probably everything else one might think of as a checkmark item for a search or search-like product.
Categories: Enterprise search, FAST, Google, Search engines | 1 Comment |
Petabyte-scale search scalability
I’ve had a couple of good talks with Andrew McKay of FAST recently. When discussing FAST’s scalability, he likes to use the word “petabytes.” I haven’t probed yet as to exactly which corpus(es) he’s referring to, but here’s a thought for comparison:
Google, if I recall correctly, caches a little over 100Kb/page (assuming, of course, that the page has at least that much text, which is not necessarily the case at all). And they were up into Carl Sagan range – i.e., “billions and billions” – before they stopped giving counts of how many pages they’d indexed.
10 billion times 100 Kb is, indeed, a petabyte. So, in the roughest of approximations, the Web is a petabyte-range corpus.
EDIT: Hah. I bet eBay and its 2-petabyte database is one of the examples Andrew is referring to …
Categories: FAST, Search engines | 4 Comments |
Analyst reports about enterprise search
Gartner and Forrester have high opinions of FAST. Not coincidentally, you can download both those firms’ recent search industry survey reports from almost any page of www.fastsearch.com. Of the two, Forrester’s is both better and more recent.
Summarizing brutally, the big firms’ consensus seems to be:
- FAST and Autonomy are the clear leaders.
- Endeca has great technology and is coming on strong.
- Everybody else is a niche player, at least for now.
- Convera is in deep yogurt.
Forrester is particularly harsh on Convera. Presumably this has much to do with the fact that Convera did not cooperate well with the survey process. I shall not speculate as to which way the causality runs there – but I should note that Convera was quite cooperative with my research last week.
Categories: Autonomy, Convera, Enterprise search, FAST, Search engines | Leave a Comment |
Web search and enterprise search are coming together
Web search and enterprise search are in many ways fundamentally different problems. The biggest problem in web search is screening out pages that deliberately pretend to be relevant to a search. The second biggest problem is picking out the crème de la crème from a long list of essentially good hits. In enterprise search, on the other hand, the biggest problem is finding a single document, or single fact, that is lonely at best, and if you’re unlucky doesn’t exist in the corpus at all. Document structures are also completely different, as are linking structures and almost every other input to the ranking algorithms except the raw words themselves.
Even so, the businesses and technologies of web and enterprise search are beginning to combine. Read more
Categories: Convera, Enterprise search, FAST, Search engines | 3 Comments |
Convera aka Excalibur aka ConQuest
Once upon a time, more than a decade before the founding of Autonomy, a New Mexico inventor had the idea for a generic pattern recognition tool. He implemented it on a PC add-in board that, if I recall correctly, plugged into the Apple II. This was the genesis of the company Excalibur Technologies.
Categories: Convera, Enterprise search, Ontologies, Search engines | 5 Comments |