Not Your Grandfather’s Data Warehouse

As I dig into my initial research, it dawned on me – some technology trends are having an impact on information management/data warehouse (DW) architectures, and EAs should consider these when planning out their firm’s road map. The next thought I had – this wasn’t completely obvious when I began. The final thought? As the EA role analyst covering emerging technology and trends, this is the kind of material I need to be writing about.

Let me explain:

No. 1: Big Data expands the scope of DWs. A challenge with typical data management approaches is that they are not suited to dealing with data that is poorly structured, sparsely attributed, and high-volume. For example, today’s DW appliances boast abilities to handle up to a 100 TB of volume, but the data must be transformed into a highly structured format to be useful. Big Data technology applies the power of massively parallel distributed computing to capture and sift through data gone wild – that is, data at an extreme scale of volume, velocity, and variability. Big Data technology does not deliver insight, however – insights depend on analytics that result from combing the results of things like Hadoop MapReduce jobs with manageable “small data” already in your DW.

Even the notion of a DW is changing when we start to think “Big” – Apache just graduated Hivefrom being part of Hadoop to its own project (Hive is a DW framework for Big Data). If you have any doubt, read James Kobielus’ “The Forrester Wave™: Enterprise Data Warehousing Platforms, Q1 2011.”

No. 2: Enterprise data virtualization technology can improve your DW architecture. Three things that lead me to this: 1) As firms discover this mature technology, they realize data can be integrated without physical ELT in some cases – light bulb moment; 2) leading firms are evolving data virtualization point solutions into enterprise deployments that deliver broad benefits (see my colleague Noel Yuhanna’s prescient “Information Fabric: Enterprise Data Virtualization”); and 3) vendors are all scrambling to add Big Data integrations into their virtualization tool kits. The upshot of these statements is that future enterprise information architectures are likely to include malleable structures combining virtual and physical stores and connections to Big Data sets.

What does this mean for EAs? I think it means that “your grandfather’s DW” architecture may not work for the future. Specifically:  

  1. You may need to put DW architecture refresh on your work plan and start collaborating with stakeholders to sketch out a new target state to capitalize on these trends. Especially if your DW is not a brand-spanking-new appliance that incorporates virtual data stores and Big Data already.
  2. There is hope if you can’t figure out how to economically deal with an environment that has multiple DWs or BI tools – enterprise deployments of data virtualization can overcome these challenges. In my second report, I’ll provide examples of how a large telecom and drug manufacturer dealt with just these problems.

Please let me know if you’d like to hear more about this and some additional research Forrester is doing in this area. My first report, “Big Opportunities In Big Data” will be out mid-month, and I’ll be talking about it at IT Forum. The second, “Data Virtualization Reaches Critical Mass,” is scheduled for June. Finally, Noel Yuhanna is updating the Information-As-A-Service Forrester Wave that covers data virtualization technology in the context of a broader strategy.

Comments

Great points to consider

Brian makes some great points about how two key technologies, Big Data and Data Virtualization, are impacting information architecture strategies and changing the role of the enterprise data warehouse as we know it.

More comments at http://data-virtualization.com/2011/05/12/big-data-meets-data-virtualiza...

Data Virtualization Best Practices

Brian,
The report "Data Virtualization Reaches Critical Mass" provides a pragmatic discussion on trends and best practices for data virtualization.

I have also observed that architects really like the best practices framework diagram in this document. I t gives them a sense of the data integration process and how data virtualization built on Lean Integration principles and self-service can accelerate each stage in the process. This is because it is not simply about federating the data in real time - it is about hiding and handling all the underlying complexity, as you have discussed in the report:

http://www.informatica.com/INFA_Resources/ds_data_services_7010.pdf

http://www.informatica.com/products_services/data_services/Pages/index.aspx

Looking forward to more on this topic from you.

Ash Parikh

New in Informatica Data Services, Data Virtualization

Hi Brian,

As you know, Informatica recently released the latest version of its data virtualization solution, Informatica Data Services version 9.1, as part of the Informatica 9.1 Platform.

http://www.informatica.com/Pages/data_virtualization_index.aspx

http://www.informatica.com/products_services/data_services/Pages/index.aspx

Just to recap, the key highlights are:

The ability to dynamically mask federated data as it in flight, without processing or staging. This is just like what we were already doing with the full palate of data quality and complex ETL-like data transformations on federated data. This is helping end users leverage a rich set of data transformations, data quality, and data masking capabilities in real-time, without additional overhead.

The ability for business users (analysts) to play a bigger role in the Agile Data Integration Process, and work closely with IT users (architects and developers), using role-based tools. This is helping in accelerating the data integration process, with self-service capabilities.

The ability to instantly reuse data services for any application, whether it is a BI tool or composite application or portal, without re-delpoyment or re-building the data integration logic. This is done graphically in a metadata-driven environment, increasing agility and productivity.

All this cuts down wait and waste, which is the end goal of any data integration project.
Informatica Data Services is the only data virtualization solution that is built on lean integration principles to enable and support agile BI.

Here is the latest demo and chalk talk:

http://vip.informatica.com/?elqPURLPage=7648&docid=1637&lsc=NA-Ongoing-2...

http://vip.informatica.com/?elqPURLPage=7648&docid=1643&lsc=NA-Ongoing-2...

Thanks,

Ash Parikh

Ergonomics Warehouse

It's not enough to simply compile and analyze massive quantities of information, you need to condition your data to make it usable for a wide variety of applications and architectures.

Here we go again

Reads like a cross between "who needs algebra anyway" and "what did the romans ever do for us".