September 20, 2009
Data marts in the world of text
CMS/search (Content Management System) expert Alan Pelz-Sharpe recently decried “Shadow IT”, by which he seems to mean departmental proliferation of data stores outside the control of the IT department. In other words, he’s talking about data marts, only for documents rather than tabular data.
Notwithstanding the manifest virtues of centralization, there are numerous reasons you might want data marts, in the tabular and document worlds alike. For example:
- Price/performance. Your main/central data manager might be too expensive to support additional large specialized databases. Or different databases and applications might have sufficiently different profiles so as to get great price/performance from different kinds of data managers. This is particularly prevalent in the relational world, where each of column stores, sequentially-oriented row stores, and random I/O-oriented row stores have compelling use cases.
- Different SLAs (Service-Level Agreements). Similarly, different applications may have very different requirements for uptime, response time, and the like. (In the relational world, think of operational data stores.)
- Different security requirements. Different subsets of the data may need different levels of security. This is particularly prevalent in the document world, where security problems are not as well-solved as in the tabular arena, and where it’s common for a search engine to index across different corpuses with radically different levels of sensitivity.
- Integrated application and user interfaces. In the relational world, there’s a pretty clean separation between data management and interface logic; most serious business intelligence tools can talk to most DBMS. The document world is quite different. Some search engines bundle, for example, various kinds of faceted or parameterized search interfaces. What’s more, in public-facing search, a major differentiator is the facilities that the product offers for skewing search results.
- Different text applications require different thesauruses or taxonomy management systems. Ideally, those should all be integrated — but the requisite technology still doesn’t exist.
Bottom line: Text data marts, much like relational data marts, are almost surely here to stay.
Related link
Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search
Subscribe to our complete feed!
Comments
2 Responses to “Data marts in the world of text”
Leave a Reply
If you think of the smallest form of a datamart, it is often times the survey response data. But that in many ways is the atomic form of disconnected data in the enterprise and it is left unprocessed and in its own silo for days..
We ran into this tool(http://insight-magnet.com) when wanted to be able to analyze our survey datamart quickly and economically. You load the file and the tool lets you dice and slice the data any which way you want. The best part is that it actually reads through your open-ended responses and tells you the categories of feedback you received.
They have a short tour on their website as well – http://insight-magnet.com/tour
@Nat,
Who is the “we” in your comment? I see that your URL points at the site you’re recommending.