A challenge to DMOZ bashers
Give or take a corrected typo, here’s a challenge to DMOZ bashers I just wrote in the flame war thread.
If you want to do something that is:
A. Correct
B. Credible
C. Potentially usefuljust go find a specific category with terrible listings, and publicize the fact with overwhelmingly clear proof of your assessment.
If that’s not EASY for you to do … then maybe DMOZ isn’t so bad after all, eh?
In particular, I’d encourage you to post a version of the category that is clearly better than what is currently there.
Categories: Categorization and filtering, Directories, ODP and DMOZ, Social software and online media | 1 Comment |
DMOZ — yet another flame war
My latest thoughts about DMOZ and the ODP may be found in this blog comment thread.
The gist is:
- DMOZ has many problems, such as categories that are at least five years out of date.
- Newly, corruptly listed sites are NOT high on the list of problems.
- In fact, the attention paid to avoiding such corruption is a terrible drain on ODP resources.
- There are a lot of liars and/or idiots bashing DMOZ in the website owner community.
- robjones is a sarcastic jerk, but he’s our sarcastic jerk.
Or something like that. As I said, it’s a flame war …
Anyhow, I’m flying off on a two-week snorkeling trip Saturday, and should be much mellower soon.
Categories: Categorization and filtering, Directories, ODP and DMOZ, Social software and online media | 8 Comments |
The case for Inxight Awareness Server
I’ve been pretty skeptical about Inxight’s Awareness Server. My theory is that ordinary enterprise search engines can index remotely anyway, and they offer much better search functionality. Inxight’s Ian Hersey was kind enough to write in and offer two counter-arguments.
First, Ian points out that there are circumstances when, due to security and permissions, you can’t really index everything via one search engine. Specifically, he offers the government as an example. OK, I can see that in the government, with its classified and/or regulated silos. However, I have trouble thinking of many more examples. While there certainly are plenty of instances where a variety of organizations share information on a somewhat arms-length basis, it’s tough to think of such cases where federated text search would come into play.
Second, Ian in essence disputes my claim of inferior functionality. While implicitly conceding — as well he should! — that Inxight’s Awareness Server doesn’t do some things full-featured search engines do, he points out analytic features that may not be found in conventional search engine offering. The big one he calls out is faceted search — which of course was the core of Intelliseek, the acquisition Awareness Server came from. Hmm. Faceted search has a checkered history, with Excite and Northern Light being perhaps the most visible among many failures. On the other hand, it’s a great idea that keeps being tried, and some versions — notably Endeca’s — have turned out well.
I guess I’ll have to reserve judgment on that part until I look at Inxight’s product and see what they do and don’t actually have.
Categories: BI integration, Business Objects and Inxight, Endeca, Enterprise search, Search engines | 1 Comment |
More on text processing in CEP
StreamBase isn’t the only complex event/stream processing (CEP) vendor doing text processing. Progress Apama is as well. Stemming, fuzzy matching, and so on seem to happen all the time. But there’s also at least one case where they flat-out do sentiment analysis. Edit: I presume this is in the investment market, as that’s where most of Progress Apama’s business is. Read more
Categories: Investment research and trading, Progress and EasyAsk, Sentiment analysis, Text mining | Leave a Comment |
Event stream processors active in text filtering
OK. I secured permission to actually quote the details on something I’d previously dropped a small hint about — stream processing for text messages. Traditionally, that’s been the province of enterprise search companies. A decade ago, Verity had a kernel group of 6-7 engineers under Phil Nelson. They managed to produce not only a decent search engine, but a search engine “turned on its side” as well. I.e., instead of running one query against a corpus, they could run many queries each against documents as they arrived, one document at a time. Subsequently, the same idea has been implemented by most enterprise search providers, at least those that are serious about the intelligence market.
Well, the event-processing guys are active in that market too. At least StreamBase is. Read more
Categories: Autonomy, Business Objects and Inxight, Enterprise search, Search engines, Text mining | 2 Comments |
Text analytics marketplace trends
It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.
*Factiva is the most significant exception. Hint, hint.
If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.
*I.e., part of the “T” in “ETL” (Extract/Transform/Load).
Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet. Read more
Job posting — TEMIS is hiring consultants
TEMIS is a French company, with US headquarters in the US, as befits a company whose strongest vertical market is pharmaceuticals. I offered to put up a couple of job postings for them. (Nice of me — TEMIS isn’t even a client yet!) Here goes. Read more
Categories: Jobs and careers, TEMIS, Text mining | Comments Off on Job posting — TEMIS is hiring consultants |
Progress EasyAsk
I dropped by Progress a couple of weeks ago for back-to-back briefings on Apama and EasyAsk. EasyAsk is Larry Harris’ second try at natural language query, after the Intellect product fell by the wayside at Trinzic, the company Artificial Intelligence Corporation grew into.* After a friendly divorce from the company he founded, if my memory is correct, Larry was able to build EasyAsk very directly on top of the Intellect intellectual property.
*Other company or product names in the mix at various times include AI Corp and English Wizard. Not inappropriately, it seems that Larry has quite an affinity for synonyms …
EasyAsk is still a small business. The bulk is still in enterprise query, but new activity is concentrated on e-commerce applications. While Larry thinks that they’ve solved most of the other technical problems that have bedeviled him over the past three decades, the system still takes too long to implement. Read more
Categories: BI integration, Language recognition, Mercado, Natural language processing (NLP), Progress and EasyAsk, Speech recognition | 1 Comment |
BOBJ Inxight insights
When a company announces an acquisition, it usually does a round of limited-content briefings, in no small part because the antitrust lawyers won’t let them do anything else. Once the deal closes, antitrust restrictions are lifted, and they do another round of briefings. These, typically, are vague and platitudinous.
Business Objects/Inxight have now reached that point. Even so, my briefing yesterday had some aspects worth writing up. Read more
Categories: BI integration, Business Objects and Inxight, Enterprise search, Search engines | 2 Comments |
Text analytics buzzphrase of the year – “Voice of the Customer”
If there was one theme to this year’s Text Analytics Summit, it’s “Voice of the Customer.” Attensity’s pre-conference press release was about a Voice of the Customer offering. Clarabridge’s sponsored user talk was about a Voice of the Customer app. SPSS’s marketing materials emphasized Voice of the Customer. Sentiment analysis and Web/blog scraping were frequently mentioned, in contexts such as “customer care,” “reputation management,” and/or “competitive intelligence.”
But above all, it was “Voice of the Customer.” I know it’s till June, but I think we have our text analytics industry buzzphrase of the year.
Categories: Attensity, Clarabridge, SPSS, Text Analytics Summit, Text mining, Voice of the Customer | 3 Comments |