Database Virtualization Could Induce I&KM Vertigo

JameskobielusBy James Kobielus

Databases are evolving faster than ever. Long regarded as an essential but slightly boring centerpiece of the enterprise information management infrastructure, the database is becoming more fluid and adaptive in architecture in order to keep pace with an online world that's becoming virtualized at every level.

In many ways, the database as we know it is disappearing into a virtualization fabric of its own. In this emerging paradigm, data will not physically reside anywhere in particular. Instead, it will be transparently persisted, in a growing range of physical and logical formats, to an abstract, seamless grid of interconnected memory and disk resources; and delivered with subsecond delay to consuming applications. Forrester's ongoing research into the growing market for in-memory distributed-caching middleware shows that this trend is accelerating.

As database revolutions pick up speed, information and knowledge management (I&KM) professionals are likely to get a bit dizzy trying to keep their perspective and sort through competing approaches. When and where should you implement in-memory vs. on-disk data-persistence approaches? When should you go with row-based vs. column-oriented vs. inverted indexing vs. other physical storage models? When does it make sense to implement any of competing vendor-specific OLAP variants--old and new -- for logical modeling (MOLAP, ROLAP, HOLAP, DOLAP, D'oh!LAP, SchmoLAP, etc.)? When should you federate your databases behind an on-demand semantic virtualization middleware layer vs. consolidate it all in an enterprise data warehouse? When should you buy into one vendor’s analytic-database religion (be it columnar or whatever?) and when should you remain strictly storage-layer-agnostic?

One of the chief trends driving database virtualization is users' need for more robust middleware fabric in support of real-time BI. In order to ensure guaranteed subsecond latency for BI, the infrastructure must incorporate a policy-driven, latency-agile, distributed-caching memory grid from end-to-end. However, the convergence of real-time business-intelligence approaches onto a unified, in-memory, distributed-caching infrastructure may take more than a decade to come to fruition because of the immaturity of the technology; lack of multivendor standards; and spotty, fragmented implementation of its enabling technologies among today's BI/DW vendors.

Nevertheless, all signs point to this trend's inevitability--most notably, Microsoft's recent announcement that it is developing its own information fabric platform, codenamed "Project Velocity," to beef up its real-time analytic and transactional computing capabilities. Bear in mind that no BI/DW vendor has clearly spelled out its approach for supporting the full range of physical and logical data-persistence models across its real-time information fabrics. But it's quite clear that the industry is moving toward a new paradigm wherein the optimal data-persistence model will be provisioned automatically to each node based on its deployment role (including EDW, ODS, staging, data mart)-- and in which data will be written to whatever blend of virtualized memory and disk best suits applications' real-time requirements.

It would be ridiculous to imagine this evolution will take place overnight. Even if solution vendors suddenly converged on a common information-fabric framework -- which is highly doubtful -- I&KM managers have too much invested in their enterprises' current data environments to justify migrating them to a virtualized architecture overnight.

Comments

re: Database Virtualization Could Induce I&KM Vertigo

Concise overview of current trends towards "information fabrics" and the physical and semantic alternatives that we are encountering and need to deal with. -- Mills

re: Database Virtualization Could Induce I&KM Vertigo

Hi James,Your Network World article titled Real time drives database virtualization is insightful and thought provoking. You have also outlined trends that are visible to see. It helped me develop a perspective which I have shared athttp://sharevm.wordpress.com/2009/02/03/database-virtualization/The real issue is that there is a lot of hype surrounding virtualizatioon today and database virtualization, especially for BI, seems relegated to remain a niche play and unlikely to hit the commoditization curve, i.e., widepsread adoption, that hardware virtualization is on today; desktop and application virtualization can reach over the next 3 - 5 year period.xkoto has built a SQL mediation layer above commodity DBMS'for the mid-market and IBM has been licensing their technology since 2006. Oracle's acquisition of TangoSol, Microsoft's Project Velocity are following HP's NeoView usage of distributed caches for solving large BI queries. Strictly speaking these are not virtualization technology related innovations - they are simpley replacing shared memory clustering with distributed, shared-nothing caches that provide location transparency for dats. Tandem was doing this all along in the mid to late eighties. Seems to be old wine in a new bottle. What groundbreaking innovations do you foresee in this area? Why are you personally excited about it?

re: Database Virtualization Could Induce I&KM Vertigo

Paul:Interesting post. Database virtualization has certainly been around in various forms for a while. It seems to be coming to the fore now in cloud computing, in which relational, dimensional, file-based, in-memory, inverted indexing, and other storage/persistence models will coexist. I'm excited for that reason, and also because the trend is asking for someone to bring some architectural coherence to all these developments. Noel Yuhanna and I will present on database virtualization at Forrester's IT Forum in May, and we're co-authoring a paper on Data Warehouse Virtualization in Q2.Jim