Posted by James Kobielus on September 12, 2008
Few enterprise data warehousing (EDW) professionals regard the key rival approach--data federation--to be a best practice. Usually, the reasons for this disdain are valid, such as the fact that federated environments are not optimized for heavy-hitting data matching, merging, transformation and cleansing, all of which are essential functions to deliver a "single version of the truth" for business intelligence (BI).
But data federation is refusing to die as an alternative to EDW--and is taking on new importance in organizations' data management strategies. Data federation is an umbrella term for a wide range of operational BI topologies that provide decentralized, on-demand alternatives to the centralized, batch-oriented architectures characteristic of traditional EDW environments.
Nevertheless, they are complementary approaches, each with its respective pros and cons. For example, data federation is better suited to near-real-time BI requirements than the batch-oriented EDWs deployed in many organizations. In practice, data federation and EDW (aka data consolidation) are not mutually exclusive. Many real-world data federation deployments are in fact hybrid approaches that involve EDWs to varying degrees. Federation environments can coexist with, extend, virtualize, and enrich EDWs to help users pull a wide range of disparate data into their reports, queries, dashboards, and analytic applications.
To determine whether an operational BI scenario requires a federated solution--in lieu of or supplementing an EDW-hubbed topology--Information and Knowledge Management (I&KM) professionals should determine whether their data management environment fits any or all of the following criteria:
- Multiplicity: Are there are multiple distributed data stores within any or all of the principal data-persistence tiers, including online transaction processing (OLTP) systems, EDWs, operational data stores (ODSs), and online analytical processing (OLAP) data marts?
- Heterogeneity: Does this distributed data implement a wide range of incompatible formats, schemas, syntaxes, models, glossaries, and vocabularies, including myriad structured, semi-structured, and unstructured formats?
- Autonomy: Is this data under the control, administration, and governance of a wide range of autonomous organizations, business units, and ownership domains?
- Opacity: Are there are security, privacy, and other sensitivity restrictions that prevent external visibility into this data and metadata, and/or restrict external domains' ability to load, replicate, synchronize, and use it the data in their EDW, BI, and application environments?
- Inflexibility: Are there constraints of a technical, administrative, or policy nature that prevent the EDW--or other relevant data consolidation points--from expanding in capacity, taking on more near-real-time workloads, and otherwise expanding their functional or deployment role in your data management environment?
As decentralized service-oriented architectures (SOA) gain traction in operational BI environments, enterprise requirements for data federation--with or without an EDW in the loop--will continue to grow. Also, as EDWs begin to manage petabyte-scale data sets, batch transfer of this data will prove ever more costly and cumbersome--and federated query of this "too massive to move" data will become the only viable option.