I’d like to hear what my colleagues out there think about the convergence of structured and unstructured data business intelligence. Here are the intersects as I see them. I see two types of BI paradigms emerging in the future:
Structured OLAP will continue to be just that – structured, as far as the process and UI are concerned. However, to become more effective, we will need to bring unstructured data into the analysis, in a way that is transparent to the end user. For example, as we are creating customer segmentation analysis for a marketing campaign, in addition to structured data such as customer demographics and prior buying behavior, we’d want to bring in comments hidden in customer email and voice mail requests. In an ideal environment, the OLAP engine will automatically match these emails to a customer dimension and quantify and qualify comments into star schema facts (number of requests) and dimensions (request types).
Combination of search and light-weight query used for ad-hoc research and analysis. Here, a familiar search text box should be the main UI, however, the engine should be smart enough to a) quantify and qualify unstructured results into facts and dimensions – a so called guided search, or b) recognize that the request is actually about data stored in a structured repository and automatically return search results via OLAP, cross tab or tabular report format.
For years I've been predicting that relational DBMS will run out of steam when it comes to effectively managing and manipulating very large, heterogeneous (structured + unstructured) data sets for business intelligence. First, RDBMS were never designed and optimized for unstructured data (not just XML, which is structured data in my definition, but truly unstructured text pages). Second, there's just too much overhead and cost in RDBMS for handling OLTP functions. The result: search index DBMS will be king in BI and DSS in the future.
Today’s announcements that Microsoft may be buying Yahoo came several weeks early. On May 17th I would’ve gone on the record at Forrester IT Forum in Nashville by saying the following, and I quote from my presentation paper: “DBMS/BI vendor may buy a search company, to address the trend of increasing importance of unstructured data in BI and to obtain an early leading position in the space. I know it should be Oracle or IBM, but it probably won’t, since these guys will never admit that their relational DBMS cannot do something. Microsoft is a more likely contender since they know they won’t leapfrog IBM or Oracle in relational DBMS and they could use this opportunity to stick one to Google too.”
I thought Microsoft would buy somebody like Fast Search, but I guess that was too small for them.
Remember George Costanza from a Seinfeld episode where he was pulling his hair out about “the two worlds colliding”? He was agonizing over the world of his girlfriends and the world of his friends that should never mix. In my world, process and data, separate disciplines until recently, are now “colliding”. While some of the vendors have already been toying with the convergence of both disciplines (IBM, Oracle, SAP), today’s announcement by Tibco that it will acquire a Spotfire, is the first transaction that will merge a pureplay middleware vendor with a pureplay BI vendor (a convergence that Forrester’s been predicting for almost a year, please see our Business Intelligence Meets BPM In The Information Workplace research document. But by acquiring Spotfire, Tibco has actually achieved more than one goal.
Being efficient is no longer enough. Enterprises can no longer stay competitive just by squeezing more efficiencies from operational applications, including workflow, business process management (BPM) and business rules engine (BRE) — business intelligence applications are needed to become more effective. For example, while workflow and rules are be used to efficiently process a customer credit application, Business Intelligence analytics are needed to effectively segment customer population and extend the credit offer to a much more targeted customer segment for a better response, cross-sell/up-sell ratios.
The actual convergence of process and data. The other slant is the natural interdependency of process and data from two angles: a) one needs data to feed and enrich the business process and process rules, and b) an event (an alert, for example) triggered by a data condition has to go into a process so that it can be followed up and acted on.