Mark Logic viewed as a different kind of text search technology vendor
I’m putting up two posts this morning on Mark Logic and its MarkLogic product family. The main one, over on DBMS2, outlines the technical architecture — focusing on MarkLogic as an XML database management system — and provides a bit of overall context. This post attempts to position MarkLogic against alternative kinds of text analytics engine.
For the most part, MarkLogic is indeed sold (and bought) for the storage, manipulation, and retrieval of text. (One long-confidential exception to this rule is scheduled to be unveiled at the June user conference.) Most applications seem to fit a custom publishing/enhanced search paradigm:
-
Ingest text.
-
Enhance it.
-
Serve it up in chunks, typically via a sophisticated search interface.
Differences vs. conventional search engines include:
-
Documents are indexed on the fly, and available for query immediately upon ingestion.
-
MarkLogic is a real, ACID-compliant DBMS. So everything else – such as a user tag or comment — is also available for immediate query. Mark Logic says customers are making a lot of use of this feature.
-
MarkLogic has a real programming language – specifically XQuery. (Note: XQuery is a much fuller language than, say, standard SQL, with conditional logic, arithmetic, try/catch, and so on.)
-
MarkLogic handles fielded information, document chunks, and whole documents in a completely integrated fashion. Truth be told, I don’t know exactly to what extent Autonomy or FAST do or don’t fall short of this standard, but it’s never seemed to be as much of a priority on their part as I’ve felt it should be.
Mark Logic also claims huge advantages in corpus administration. Scalability seems good too; there’s a national-intelligence customer with a 200 terabyte database. And they’re proud of a feature called lexicons, although it seems so obvious to me that I’ve so far failed to muster what they’d probably regard as the proper level of excitement about it. (In SQL terms, it seems to be a combination of SELECT and COUNT DISTINCT, both of which are capabilities I’d think would be in XQuery anyway.)
Comments
4 Responses to “Mark Logic viewed as a different kind of text search technology vendor”
Leave a Reply
[…] A companion post over on Text Technologies takes a text search view of MarkLogic. […]
At eXcelon (the second startup that I co-founded), we tried to do an XML DBMS product. Admittedly the company marketed it very poorly. Nevertheless, it’s a hard product to sell. The problem is that Oracle and other RDBMS companies (I think IBM’s DB/2) have XML features that most users find pretty attractive. And most users are much happier buying a database management system from one of the big vendors. I wish Mark Logic all the luck in the world, though!
Dan,
At the time Object Design/Excelon hadn’t yet been bought by Progress, Oracle’s and IBM’s XML support were rudimentary indeed. Anyhow, it all comes down to what kinds of XML management you need to do. Most XML is either lightly transformed relational data or lightly transformed text data. Accordingly, most XML management is either decently handled by RDBMS, or else is done in a text search context.
As you know, I’ve been writing about all this over on DBMS2.
Best,
CAM
[…] the same league as application development over relational DBMS. The choices are mainly XML (e.g., MarkLogic), SQL for text integrated into RDBMS (limited by the weakness of those integrations), and something […]