June 24, 2006

Attensity, extractive exhaustion, and the FRN

Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean. Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in a blog post I made at the time, and on Attensity’s web site. This time, David’s Text Analytics Summit speech was basically a pitch for Attensity’s latest product release – and it was a pitch well worth hearing.

The basic story is that selective fact extraction from text is a knowledge-engineering-intensive process. You need to determine which facts to extract, and then determine how to extract those particular kinds of facts. So Attensity has a better idea; it will extract all facts, not just some, and dump them in a “fact relationship network” (FRN). The FRN is two relational tables, one for facts and one for relationships, suitable for copying to a Teradata machine. Attensity calls this “exhaustive extraction.”

To some extent, exhaustive extraction amounts to what in the math biz is called restating the problem.

Still, this approach would seem to offer some nice advantages. Separating the initial extraction from later lexicography is pure goodness, for all the reasons that modularity is generally good. The same goes for separating the initial extraction from later decisions as to just what information it is you care about anyway. And generally, this approach should help in applications where somebody might say, in David’s phrase, “I don’t know what I’m looking for, but I’ll know it when I see it.”

Comments

10 Responses to “Attensity, extractive exhaustion, and the FRN”

  1. Text Technologies»Blog Archive » Introduction to ClearForest on July 23rd, 2006 7:04 am

    […] ClearForest is one of the two companies whose name comes up for fact extraction applications, probably even a little ahead of Attensity. Their flagship account is the GM deal they did with IBM, kicking off the whole warranty report mining boom. Procter & Gamble is no slouch of a customer either. They’re involved enough in anti-terrorism that, when I asked Jay if he knew who Cogito was, he said “Of course.” And apparently one of their techie founders is the guy who coined the term “text mining” in the first place. […]

  2. Text Technologies»Blog Archive » Pioneers moving on on July 26th, 2006 9:44 pm

    […] Ramana Rao is leaving Inxight, or has by now. Today I also discovered that Todd Wakefield is leaving Attensity. Such things happen in all industries, of course. • • • […]

  3. Text Technologies»Blog Archive » More on Attensity on July 27th, 2006 5:37 am

    […] They want to be positioned in the BI space, e.g. as “ETL for text/unstructured data.” They place a lot of value on their partnership with Business Objects and Teradata. And, as a key part of the exhaustive extraction/FRN story, they think that BI/data warehouse information roll-up tools are an excellent (if imperfect) substitute for hardcore semantic extraction. […]

  4. Text Technologies»Blog Archive » Clarabridge takes on Attensity on March 26th, 2007 8:36 pm

    […] The closest analogy to what Clarabridge does is Attensity’s new(ish) strategy – extract “facts” from documents and dump them into a relational database management system. In particular, Clarabridge and Attensity alike make the case “Our categorization is more flexible because it’s applied only after the extraction happens.” […]

  5. Text Technologies»Blog Archive » TEMIS, part 1 – overview on April 4th, 2007 3:18 pm

    […] Attensity FRN (fact-relationship network) […]

  6. Text Technologies»Blog Archive » When to use exhaustive extraction on October 5th, 2007 8:54 pm

    […] with both Clarabridge and Attensity this week. Since they’re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In […]

  7. Text Technologies»Blog Archive » The Clarabridge approach to text mining on October 6th, 2007 8:14 pm

    […] Attensity, which uses a simple normalized relational schema, Clarabridge dumps the extracted data into a star schema. (The Clarabridge folks are from […]

  8. Why MapReduce matters to SQL data warehousing | DBMS2 -- DataBase Management System Services on August 26th, 2008 2:12 am

    […] Data mining can involve very high-dimensional problems with super-sparse tables. And while exhaustive text extraction into flat tables works OK, getting from there to common-sense semantic hierarchies can be a bit of a kludge. […]

  9. Infology.Ru » Blog Archive » Почему MapReduce так важен для хранилищ данных? on October 5th, 2008 2:59 am

    […] и используют сверхразреженные таблицы. И хотя полное извлечение фактов из текста в плоские таблицы работает хорошо, переход от него к практичным, […]

  10. Brainsturbator Presents: Facebook, the CIA, and You. | I AM NOT A RAPPER on July 19th, 2011 2:38 pm

    […] […]

Leave a Reply




Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.