Brian Hopkins serves Enterprise Architecture Professionals. See the full Analyst bio.
Visit Forrester.com to learn how we make Enterprise Architecture Professionals successful every day.
Follow Brian on Twitter.
Brian Hopkins serves Enterprise Architecture Professionals. See the full Analyst bio.
Visit Forrester.com to learn how we make Enterprise Architecture Professionals successful every day.
Follow Brian on Twitter.
Posted by Brian Hopkins on May 5, 2011
As I dig into my initial research, it dawned on me – some technology trends are having an impact on information management/data warehouse (DW) architectures, and EAs should consider these when planning out their firm’s road map. The next thought I had – this wasn’t completely obvious when I began. The final thought? As the EA role analyst covering emerging technology and trends, this is the kind of material I need to be writing about.
Let me explain:
No. 1: Big Data expands the scope of DWs. A challenge with typical data management approaches is that they are not suited to dealing with data that is poorly structured, sparsely attributed, and high-volume. For example, today’s DW appliances boast abilities to handle up to a 100 TB of volume, but the data must be transformed into a highly structured format to be useful. Big Data technology applies the power of massively parallel distributed computing to capture and sift through data gone wild – that is, data at an extreme scale of volume, velocity, and variability. Big Data technology does not deliver insight, however – insights depend on analytics that result from combing the results of things like Hadoop MapReduce jobs with manageable “small data” already in your DW.
Even the notion of a DW is changing when we start to think “Big” – Apache just graduated Hivefrom being part of Hadoop to its own project (Hive is a DW framework for Big Data). If you have any doubt, read James Kobielus’ “The Forrester Wave™: Enterprise Data Warehousing Platforms, Q1 2011.”
No. 2: Enterprise data virtualization technology can improve your DW architecture. Three things that lead me to this: 1) As firms discover this mature technology, they realize data can be integrated without physical ELT in some cases – light bulb moment; 2) leading firms are evolving data virtualization point solutions into enterprise deployments that deliver broad benefits (see my colleague Noel Yuhanna’s prescient “Information Fabric: Enterprise Data Virtualization”); and 3) vendors are all scrambling to add Big Data integrations into their virtualization tool kits. The upshot of these statements is that future enterprise information architectures are likely to include malleable structures combining virtual and physical stores and connections to Big Data sets.
What does this mean for EAs? I think it means that “your grandfather’s DW” architecture may not work for the future. Specifically:
Please let me know if you’d like to hear more about this and some additional research Forrester is doing in this area. My first report, “Big Opportunities In Big Data” will be out mid-month, and I’ll be talking about it at IT Forum. The second, “Data Virtualization Reaches Critical Mass,” is scheduled for June. Finally, Noel Yuhanna is updating the Information-As-A-Service Forrester Wave that covers data virtualization technology in the context of a broader strategy.
Attend Forrester’s Forum for Enterprise Architecture Professionals EMEA, June 10-11, London UK
Attend the complimentary Webinar Provide Next Generation Services To Your Customers June 5, 2013, 1:00–2:00 p.m. EST
Comments
Great points to consider
Brian makes some great points about how two key technologies, Big Data and Data Virtualization, are impacting information architecture strategies and changing the role of the enterprise data warehouse as we know it.
More comments at http://data-virtualization.com/2011/05/12/big-data-meets-data-virtualiza...
Data Virtualization Best Practices
Brian,
The report "Data Virtualization Reaches Critical Mass" provides a pragmatic discussion on trends and best practices for data virtualization.
I have also observed that architects really like the best practices framework diagram in this document. I t gives them a sense of the data integration process and how data virtualization built on Lean Integration principles and self-service can accelerate each stage in the process. This is because it is not simply about federating the data in real time - it is about hiding and handling all the underlying complexity, as you have discussed in the report:
http://www.informatica.com/INFA_Resources/ds_data_services_7010.pdf
http://www.informatica.com/products_services/data_services/Pages/index.aspx
Looking forward to more on this topic from you.
Ash Parikh
New in Informatica Data Services, Data Virtualization
Hi Brian,
As you know, Informatica recently released the latest version of its data virtualization solution, Informatica Data Services version 9.1, as part of the Informatica 9.1 Platform.
http://www.informatica.com/Pages/data_virtualization_index.aspx
http://www.informatica.com/products_services/data_services/Pages/index.aspx
Just to recap, the key highlights are:
The ability to dynamically mask federated data as it in flight, without processing or staging. This is just like what we were already doing with the full palate of data quality and complex ETL-like data transformations on federated data. This is helping end users leverage a rich set of data transformations, data quality, and data masking capabilities in real-time, without additional overhead.
The ability for business users (analysts) to play a bigger role in the Agile Data Integration Process, and work closely with IT users (architects and developers), using role-based tools. This is helping in accelerating the data integration process, with self-service capabilities.
The ability to instantly reuse data services for any application, whether it is a BI tool or composite application or portal, without re-delpoyment or re-building the data integration logic. This is done graphically in a metadata-driven environment, increasing agility and productivity.
All this cuts down wait and waste, which is the end goal of any data integration project.
Informatica Data Services is the only data virtualization solution that is built on lean integration principles to enable and support agile BI.
Here is the latest demo and chalk talk:
http://vip.informatica.com/?elqPURLPage=7648&docid=1637&lsc=NA-Ongoing-2...
http://vip.informatica.com/?elqPURLPage=7648&docid=1643&lsc=NA-Ongoing-2...
Thanks,
Ash Parikh
Ergonomics Warehouse
It's not enough to simply compile and analyze massive quantities of information, you need to condition your data to make it usable for a wide variety of applications and architectures.
Here we go again
Reads like a cross between "who needs algebra anyway" and "what did the romans ever do for us".