One Of Many BI Trends — Convergence Of Structured And Unstructured Data

by Boris Evelson.

I’d like to hear what my colleagues out there think about the convergence of structured and unstructured data business intelligence. Here are the intersects as I see them. I see two types of BI paradigms emerging in the future:

  1. Structured OLAP will continue to be just that – structured, as far as the process and UI are concerned. However, to become more effective, we will need to bring unstructured data into the analysis, in a way that is transparent to the end user. For example, as we are creating customer segmentation analysis for a marketing campaign, in addition to structured data such as customer demographics and prior buying behavior, we’d want to bring in comments hidden in customer email and voice mail requests. In an ideal environment, the OLAP engine will automatically match these emails to a customer dimension and quantify and qualify comments into star schema facts (number of requests) and dimensions (request types).
  2. Combination of search and light-weight query used for ad-hoc research and analysis. Here, a familiar search text box should be the main UI, however, the engine should be smart enough to a) quantify and qualify unstructured results into facts and dimensions – a so called guided search, or b) recognize that the request is actually about data stored in a structured repository and automatically return search results via OLAP, cross tab or tabular report format.

I also see emergence of search DBMS as a superior architecture for Decision Support (DS) over relational databases. Traditional relational databases have been designed from ground up for structured data and therefore play a constant balancing act of trying to fit a square peg into a round hole – fitting unstructured data such as XML, rich media and other “blobs” (Binary Large Objects) into relational structures. As a result, relational databases are pushing their limits of capabilities to store and process unstructured data.  Storage and processing themselves are less of an issue here, but rather it is the inherent inflexibility and rigidity of relational schemas that do not lend themselves naturally to unstructured data processing. One typically needs to know the kinds of questions that will be asked and what types of analysis will be run in the future in order to model data for structured BI. But unstructured data BI is unpredictable by nature, there’s no room there for predefined schemas. Index DBMS, such as ones from FAST Search and Endeca, are nothing but indexes with data imbedded in them. They require no schemas, are infinitely more flexible, and can handle structured data just as naturally as unstructured.

Comments?