December 27, 2006

Text analytics is finally being used for investment analysis

Jay Henderson of ClearForest tells me that hedge funds are one of their more interesting growth areas. It’s about time.

I think a lot of the reason for investment firms not making more use of text analytics has been structural — Factiva, the (relatively speaking) mammoth joint venture of Reuters and Dow Jones, is forbidden by its parent companies from meeting investment firms’ needs. And that’s kind of a pity, as it’s probably the best-positioned firm to do so. It’s good to hear that the little guys are finally filling the gap.

Categories: ClearForest/Reuters, Investment research and trading, Text mining

3 Comments

December 27, 2006

Telling Attensity and ClearForest apart

So far as I can tell, Attensity’s strategy when the company was originally founded was rather like ClearForest’s strategy today – and vice-versa. That said, here’s where they seem to stand at this time:

Attensity wants to make text analytics very easy to integrate into business intelligence and data mining – at the moment, they’re not too focused on the differences between those two disciplines – and is trying to deliver the best possible fact extraction consistent with that charter.
ClearForest wants to provide really great information extraction — to the limits of what can be done without excessive knowledge engineering – and is trying to integrate as well as possible with other technologies, the better to serve the customers who need what they offer.

Categories: Application areas, Attensity, ClearForest/Reuters, Custom publishing, Mark Logic, TEMIS, Text mining

Leave a Comment

November 30, 2006

Does web text mining need to be cloaked?

One semi-flagship use for text mining is to track sentiment across news articles, websites, etc. Should this be done openly, or is there a danger of being spoofed? (I doubt it; probably no more than a few of the sites would ever be motivated to do so.) But what if you’re making many hits against the same site, to the point that your traffic is unwelcome? Or maybe the site is a direct competitor. In such cases, hiding your tracks may be more relevant.

If any of this is an issue for you, you should take a look at Anonymizer’s growing enterprise offering. Apparently, there are commercial enterprises using thousands of seats each of Anonymizer’s cloaking service.

Categories: Text mining

Leave a Comment

November 17, 2006

Site and feed changes coming soon

We’re going to upgrade access to our research in various cool ways in the near future.

Right now, please bear with me in what is essentially a test post. ~~In theory, I’ve switched the feeds here over to Feedburner. Now I’m going to test if that really has happened.~~

EDIT: That didn’t work. I’m going to put things back the way they were.

Categories: About this blog

1 Comment

November 11, 2006

Text mining and search, joined at the hip

Most people in the text analytics market realize that text mining and search are somewhat related. But I don’t think they often stop to contemplate just how close the relationship is, could be, or someday probably will become. Here’s part of what I mean:

Text mining powers search. The biggest text mining outfits in the world, possibly excepting the US intelligence community, are surely Google, Yahoo, and perhaps Microsoft.
Search powers text mining. Restricting the corpus of documents to mine, even via a keyword search, makes tons of sense. That’s one of the good ideas in Attensity 4.
Text mining and search are powered by the same underlying technologies. For starters, there’s all the tokenization, extraction, etc. that vendors in both areas license from Inxight and its competitors. Beyond that, I think there’s a future play in integrated taxonomy management that will rearrange the text analytics market landscape.

Categories: Attensity, Business Objects and Inxight, Enterprise search, FAST, Google, IBM and UIMA, Ontologies, Open source text analytics, Search engines, Text mining

3 Comments

October 22, 2006

Enterprise-specific web search: High-end web search/mining appliances?

OK. I have a vision of one way search could evolve, which I think deserves consideration on at least a “concept-car” basis. This is all speculative; I haven’t discussed it at length with the vendors who’d need to make it happen, nor checked the technical assumptions carefully myself. So I could well be wrong. Indeed, I’ve at least half-changed my mind multiple times this weekend, just in the drafting of this post. Oh yeah, I’m also mixing several subjects together here too. All-in-all, this is not my crispest post …

Anyhow, the core idea is that large enterprises spider and index a subset of the Web, and use that for most of their employees’ web search needs. Key benefits would include:

Filtering out spam hits. This is obviously important for search, and in some cases could help with public-web text mining as well. It should be OK to be more aggressive on spam-site filtering in an enterprise-specific index than it is in general web search.
Filtering out malicious/undesirable downloads of various sorts. I’m thinking mainly of malware/spyware here, but of course it can also be used for netnannying porn-prevention and the like as well. Again, this is more easily done for the enterprise market than for the search world at large. (I anyway think that Google could blow Websense out of the water any time they wanted to – except, of course, for the not-so-small matter of not being seen as participating in the censorship business — but that’s a separate discussion.)
Capturing employees’ search strings. This could be useful for various purposes, including discerning their interests, and building the corporate ontology for internal web search.
Freshness control. If there’s a site you really care about, you can make sure it’s re-indexed frequently.

Categories: Categorization and filtering, Convera, Enterprise search, FAST, Google, IBM and UIMA, Search engines, Spam and antispam, Specialized search, Text mining, Website filtering

1 Comment

October 7, 2006

Danny Sullivan and Yahoo on the past and future of search

Danny Sullivan argues that search interfaces haven’t changed significantly for a decade, and that this suggests that the ways people have tried to change them aren’t likely to work when people try the same things yet again. He backs his thesis up with lots of historical screenshot pictures, some of which actually made me a bit nostalgic. In particular, he suggests that topic/cluster-based query refinement is a non-starter.

If he’s wrong, it will probably be because people today are satisfied with search only some of the time. Here, in a Business Week article, is a pretty good cut at where search so far has and hasn’t worked:

“Web searching can be frustrating for a lot of people,” says Tomi Poutanen, Yahoo’s director of product management for social search. “Search does a very good job if you are searching for something factual or doing research. It is not as good when searching for experiential knowledge—such as what is a good sushi restaurant in New York—where a person’s experience would count in having that answer.”

Categories: Search engines

Leave a Comment

October 4, 2006

KXEN is getting into text mining

Data mining challenger KXEN is getting into text mining, and they’re writing all their own stuff. Not even any Inxight filters. Weird. It will be interesting to see if they stick with that plan.

EDIT: Actually, upon reviewing an e-mail I see that their text mining features are in beta already. So I guess they stuck with the plan, at least for Release 1.

Categories: BI integration, Text mining

Leave a Comment

October 3, 2006

Two own-dogfood text-based bug-tracking applications

Last July I wrote about Google’s text-based project management system. Dave Kellogg of Mark Logic offers links to discussion of a related Google project, and adds news of his own — Mark Logic built a text-based bug tracking system in its own MarkLogic technology.

Categories: Enterprise search, Google, Mark Logic, Search engines, Specialized search

Leave a Comment

September 22, 2006

My blogs stopped working through IE!

EDIT: Now they seem to be working again, with no action on my part and no known software updates through the whole process. Go figure. I do not know WordPress well enough to guess just exactly what had to have been broken and then fixed at my hosting provider to have caused these effects.

As of this writing, my blogs (DBMS2, the Monash Report, Text Technologies, and Software Memories) are all working in Firefox, and the top page of each is working in IE, but the rest of the pages/links are NOT working in IE. (But www.monash.com, a non-Wordpress site on the same host, IS still working through IE.) Naturallly, I’m addressing this problem as fast as I can. I imagine the fix will involve some sort of a reinstall and/or theme change, which could alter the blogs’ look-and-feel, maybe not for the better (especially at first). I apologize for the inconvenience!

Categories: About this blog

1 Comment

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Text analytics is finally being used for investment analysis

Telling Attensity and ClearForest apart

Does web text mining need to be cloaked?

Site and feed changes coming soon

Text mining and search, joined at the hip

Enterprise-specific web search: High-end web search/mining appliances?

Danny Sullivan and Yahoo on the past and future of search

KXEN is getting into text mining

Two own-dogfood text-based bug-tracking applications

My blogs stopped working through IE!

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin