The Forrester Blog For Information & Knowledge Management Professionals

Data warehousing

June 09, 2009

BI Mashup Maturity Model? Oxymoron? Au Contraire Mon Frère!

By James Kobielus

In one of my recent tweets, I commented that Forrester has developed a maturity model for enterprise adoption of mashup-style, self-service development of business intelligence (BI) applications. Indeed, we have, and it will appear in my forthcoming Forrester report, “Mighty Mashups: Do-It-Yourself Business Intelligence for the New Economy.”

Another tweeter--an astute, but sadly, non-Forrester BI analyst--scoffed that “BI mashup maturity model” is an oxymoron. Respectfully, I must disagree. Enterprises are adopting self-service BI approaches for many reasons--principally, to cut costs in a tight economy, to unclog the development backlog, and to speed delivery of actionable, targeted intelligence to decision makers. Also, companies are providing users with BI tools to do interactive, deeply dimensional exploration of information pulled from enterprise data warehouses (EDW), marts, cubes, transactional applications, and other systems. Furthermore, organizations everywhere have adopted browser-oriented BI environments that leverage the full Web 2.0 interactivity and collaboration.

Sitting at the convergence of those trends is BI mashup, which Forrester sees as the new paradigm for truly pervasive decision-support systems. What throws off some people is the term “mashup,” which sometimes gets pigeonholed as simply referring to using, say, Google Maps to display geocoded performance metrics and sundry Internet-sourced data in a browser-based dashboard. Yes, BI mashup encompasses that approach to presenting and integrating diverse data, but its application is much broader.

Just as important, BI mashup is not bleeding-edge. Rather, BI mashup leverages the in-memory BI clients, semantic virtualization layers, data federation middleware, automated data discovery, and other next-generation BI tools and platforms.

No one vendor or user has yet put together an end-to-end BI environment that is entirely focused on mashup-style self-service development. However, Forrester sees the BI industry converging toward as mashup-oriented architecture over the coming 2-3 years. With that in mind, we sketched out a BI maturity model that encompasses the following four levels (the first 3 of which are represented in case studies in the upcoming report):

  • Level 1: Lightweight presentation mashup against transactional applications: This basic maturity level is for companies that have no prior BI or EDW; have little in-house BI expertise; and are comfortable with allowing casual users to use their browsers to customize parameterized reports from data from packaged business applications.                                                                
  • Level 2: Deep presentation mashup against EDW: This level is for organization that do have prior BI and centralized EDWs, but have an understaffed BI development group and/or  power users and data modelers urgently require the ability to mashup and explore historical and current data within sophisticated BI workspaces.
  • Level 3: Full BI mashup in federated environment: This level is for organizations that have decentralized, dynamic data management environments, and have the expertise to design reusable, composite data services to seamlessly mashup internal and external information.
  • Level 4: Full collaborative mashup with IT governance: This level is for organizations that want to encourage subject  matter experts and operational users to collaborate on analytics created through mashup, but who are also concerned that all mashups be controlled, governed, and monitored in accordance with enterprise policies and best practices.

As I said, it will take a few years before we see a substantial number of enterprise case studies that implement the pinnacle of collaborative mashup with tight governance. Nevertheless, when you follow the evolution of next-generation solution portfolios from leading BI vendors such as SAP, IBM, Microsoft, and others, it’s clear that self-service user-centric mashup, to varying degrees, is a core theme.

BI mashup has such a strong business case that we’re confident it’s more than simply a “down economy” theme. It will almost certainly grow in importance for information and knowledge management professionals as the economy improves.

May 26, 2009

Database Religions Dissolve Into The Big Billowing Virtual Data Cloud

James-Kobielus By James Kobielus

Virtualization is a venerable old computing concept that has achieved new life in recent years.

Virtualization brings to life a new world of more flexible service provisioning while cleverly emulating the old world that is being replaced. Virtualization refers to any approach that abstracts the external interface from the internal implementation of some service, functionality, or other resource.

The promise of virtualization is that, no matter how scattered and diverse, all pooled resources behave as if they were a single unified resource, both for usage and administration. In a sense, this is the practical magic that Arthur C. Clarke identified with advanced technology. The external interface may conceal various facts about the implementations of the underlying resources. The virtualized resources may:

•    run on diverse operating and application platforms;
•    have been deployed on nodes in diverse locations;
•    have been aggregated across diverse hosting platforms (or partitioned within a single hosting platform, either through virtual machine software, separate CPUs, or separate blade servers); and have been provisioned dynamically in response to a client request.

When Noel Yuhanna and I presented on enterprise database virtualization last week at Forrester IT Forum, we took pains to point out that is not a radically new paradigm. In fact, database administrators (DBAs) have been doing virtualization for a long time and not realizing it. We’re all familiar with such database virtualization approaches as policy-based server clustering, massive parallel processing database grids, and enterprise information integration. In these environments, you can identify the virtualization layer as “single system image,” “semantic abstraction,” or some other approach.

What all these approaches share is that they make two or more repositories behave as if they were a single database for unified access, query, reporting, predictive analytics, and other applications. If you wish, I could drill down further into the layers of database virtualizationdata virtualization, transaction virtualization, and platform virtualizationbut that would be too much for a mere blog post.

One twist that I didn’t have time to explore in depth last week is the notion that the traditional hub-and-spoke enterprise data warehousing (EDW) architecture is itself a form of database virtualization. The hub-and-spoke model transforms analytic data to a common “spoke-side” semantic access model, such as star schema or columnar. As such, this approach abstracts from the data models (usually 3NF relational) implemented at the EDW hub tier, the staging tier (perhaps file-based), and OLTP sources (perhaps hierarchical, XML, or what have you).

When you realize that each data-persistence approach has its optimal deployment sphere, you’re thinking database virtualization. At that point, you start to realize that the various database religionsrelational is supreme, columnar is king, and so forthare not absolute truths. They’re simply sectarian texts in a tradition of longer vintage: the evolution of truly all-encompassing data virtualization clouds.

Yes, I’m using “cloud” in this context because it best describes this new paradigm. Cloud-based virtualization is beginning to seep into analytic infrastructures. To support flexible mixed-workload analytics, the EDW, over the coming five to 10 years, will evolve into a virtualized, cloud-based, and supremely scalable distributed platform.

What are the outlines of this new paradigm? The virtualized EDW will allow data to be transparently persisted in diverse physical and logical formats to an abstract, seamless grid of interconnected memory and disk resources and to be delivered with sub-second delay to consuming applications. EDW application service levels will be ensured through an end-to-end, policy-driven, latency-agile, distributed-caching and dynamic query-optimization memory grid, within an information-as-a-service (IaaS) environment. Analytic applications will migrate to the EDW platform and leverage its full parallel-processing, partitioning, scalability, and optimization functionality. At the same time, DBAs will need to make sure that cloud-based DW offerings meet their organizations’ most stringent security, performance, availability, and other service-level requirements.

I won’t opine here and now on how much enterprise data will be persisted in public clouds vs. private environments that incorporate many of the same platform virtualization technologies. I’ll save that discussion for the upcoming Forrester reports that Noel and I are developing in virtualization of transactional and analytic databases, respectively.

Expect those in Q3 or thereabouts. Thanks everybody who attended our preso last week in Vegas!

May 16, 2009

Information Post-Discovery - Latest BI Trend

Boris-Evelson By Boris Evelson

I just came back from an exciting week in Orlando, FL, shuttling between SAP SAPPHIRE and IBM Cognos Forum conferences. Thank you, my friends at SAP and IBM for putting the two conferences right next to each other (time- and location-wise), and for saving me an extra trip!

Both conferences showed new and exciting products and both vendors are making great progress towards my vision of “next generation BI”: automated, pervasive, unified and limitless.  I track about 20 different trends under these four categories, but there’s a particular one that is especially catching my attention these days. It went largely under covers at both conferences, and I was struggling with how to verbalize it, until my good friend and peer, Mark Albala, of http://www.info-sight-partners.com, put it in excellent terms for me in an email earlier today: it’s all about “pre-discovery” vs. “post-discovery” of data.

We can debate endlessly the pros and cons of traditional row oriented RDBMS vs. newer DBMS architectures specifically designed for BI and OLAP (like columnar – Sybase IQ, Vertica, Paraccel,  inverted index – Microsoft FAST Search, Endeca, Attivio, tokenized – illuminate, and in-memory analytical DBMS – TIBCO Spotfire, QlikTech) – and I had lots of fun doing that on the DM Radio show last Thursday!  One thing, however, remains undisputed: traditional RDBMS and OLAP architectures require pre-discovery of data, aka data integration and data modeling. No matter how much flexibility and richness we think we built into out relational or multidimensional data models, they are still only as good as our initial design. If we did not anticipate the types of questions that would be asked of our application in the future, no fixed relational or multidimensional data models will be able to help us.

But the world moves way too fast on us. For example, the methodology behind economic capital calculation in the financial services industry, according to Basel II requirements, may change on a weekly, sometimes even on a daily, basis due to regulatory and competitive pressures. No traditional data models and BI tools can keep up with such furiously quickly changing requirements. As a result, one of our recent surveys found that more than half of the respondents did not have most of the information they were looking for in their BI applications, and close to two thirds relied on IT for new BI requests.

What’s the answer? There are many, but one partial answer is post-discovery, rather than pre-discovery of data. For example, an inverted index DBMS from Attivio, or a tokenized data store from illuminate, or in-memory models from TIBCO Spotfire and QlikTech just need you to index or loading data "as is", not really requiring any modeling up front. And because all of these technologies can indeed cross reference every attribute with every other attribute (it's an index!), a virtual data model is created on the fly simply by virtue of asking it a question. Gone are the days of having to analyze your requirements, document your requests, work with IT to make it happen - a process that often takes weeks or months.

Sounds good? It does, but this is obviously not a BI panacea. Yet. I do not think we will see a mass conversion to these analytical engines that allow for post-discovery of information and data models in the short term. Why?

  • Many challenges still remain with these technologies such as lack of operational BI (you typically need to reload entire model or rebuild entire index to make updates), administration (partitioning, and modular backup and restore are not easy tasks with index DBMS today), and other mission critical, production features of large enterprise DBMS.
  • All these new technologies still rely on someone else doing all of the upfront leg work to integrate, reconcile and cleanse the data. There is no magical work around the hard work of planning, designing and implementing data quality and master data management processes and applications.

How’s all this related to SAP and IBM, you ask? Simple.

  • SAP is heavily promoting its guided searchengine – Explorer – previously known as Polestar. It’s a similar index to Endeca and Microsoft FAST Search, capable of allowing BI users discovering answers to previously unplanned questions. Unlike Attivio, though, Explorer still requires an underlying data model, either in the form of Business Objects Universe or SAP BW, but because under the covers it uses the SAP TRex index engine, the right technology is there, and it’s a huge step in the right direction for SAP.
  • IBM is also leading the market here. While its Cognos GoSearch product allows for some guided exploration too, it’s the acquisition of Exeros a couple of weeks ago that gives IBM unique capabilities (with some competition from Composite Software and Sypherlink) to post-discoverdata, data relationships and continuously update data models and data content based on the newly discovered information. I can't wait till IBM comes up with a fully automated way to update the data model, Cognos Framework Manager, and Cognos reports and dashboards based on a source system change!

This is yet another proof – and I’ll never get tired of saying this – that BI market is as vibrant, exciting and far from commoditization as ever!

April 20, 2009

Oracle’s Sun Acquisition Accelerates Push Into Data Warehousing Appliances

James-Kobielus By James Kobielus

Last fall, Oracle CEO Larry Ellison announced that his company was getting into the hardware business, but I think he misspoke. At that time, he was referring to the new HP Oracle Database Machine with Exadata Storage, a high-end data warehousing (DW) appliance that incorporated hardware from his partner, as well as intelligent storage software technology from that partner--and even had the partner’s name first in the product name. If that was the criterion for “getting into the hardware business”--i.e., running on someone else’s hardware--then every software vendor on earth is in the hardware business, by my reckoning.

But today’s Oracle announcement is the real deal. Oracle is acquiring longtime partner Sun Microsystems, putting the software powerhouse fully into the hardware business--and hitting the DW industry like an earthquake. I’ll let my Forrester colleagues blog on the other implications of this deal--for the open source, Java, middleware, SOA, and other markets that Sun is in--and give you a few quick thoughts on the deal’s implications for the DW market.

For starters, this deal will give Oracle the ability to engineer a completely integrated DW appliance composed of all Oracle components, including hardware and software. Now Oracle will be able to take on Teradata and IBM--both of which have long offered their own integrated solutions--more aggressively with high-performance DW offerings. Just as important, Oracle will be able to leverage Sun’s manufacturing scale economies to bring its all-Oracle DW appliances below the $25K-per-terabyte threshold needed for penetration into the midmarket.

Also, Oracle will now have another widely adopted transactional database, the open-source MySQL, that it can--and should--consider tweaking and packaging on an DW appliance. To the extent that Oracle gives customers a choice of DBMSs on a DW appliance platform, it can gain a differentiator that Teradata, IBM, Microsoft, Sybase, and Netezza lack (you have to go to a startup such as Dataupia for multi-DBMS choice on an appliance). Many information managers prefer to stick with their existing DBMSs when building a DW, and prefer to implement that DW on an appliance to take advantage of its out-of-box balanced configuration of CPU, memory, storage, and I/O.

Furthermore, Oracle is acquiring a hardware and operating system vendor that has long been one of the primary platforms on which its own DW/DBMSs, middleware, and tools have been deployed. This acquisition can only be welcome news for joint Oracle-Sun DW customers who have worried about Sun’s solvency for some time now and began to sweat serious bullets when IBM failed to emerge as a white knight. For many Sun customers, an Oracle-powered DW platform will now look like a safer bet than ever.

Of course, there are clear risks in this pending acquisition.

First, a combined Oracle/Sun sows uncertainty among the DW appliance vendors--such as Greenplum and ParAccel--who have partnered with Sun and now find themselves in earnest “co-opetition” with full-competitor (and then some) Oracle.

Second, Oracle’s other DW appliance hardware partners--including HP, IBM, and EMC/Dell--must be concerned that Oracle will now shift focus away from their respective appliance products in favor of those it builds with its own Sun hardware group.

And finally, Oracle’s acquisition of Sun--and possible future development of a MySQL DW appliance--may discourage customers from considering third-party DW appliances, such as from Kickfire--that build on MySQL. If that happens, and a market for non-Oracle-branded MySQL DW appliances never takes root, Oracle will be denying its MySQL customers the choice that Oracle Database customers already enjoy. Currently, Oracle Optimzed Warehouse customers can deploy that enterprise DBMS as a DW on their choice of Sun, HP, IBM, and EMC/Dell platforms.

Let’s hope that Oracle makes the most of its pending Sun acquisition. Ellison either misspoke last fall, or was speaking prophecy. Like most DW vendors, Oracle’s destiny is to grow ever more hardware-dependent for its long-term scalability, performance, and optimization story.

April 01, 2009

Inmon’s Vitriolic Slap At “Virtual Data Warehousing” Does Not Withstand Scrutiny

James-Kobielus By James Kobielus

In a recent article, Bill Inmon incinerates a straw man concept that he refers to as “virtual data warehousing (DW).” For those unfamiliar with Inmon, he is generally considered the founder of DW as a data management discipline, has been at it since the 70s, and has more published books and articles to his name than most mortals. So he clearly may be considered an authority on the topic of DW.

But methinks Mr. Inmon doth protest too much on this “virtual DW” bugaboo, however defined (we’ll get to that in a moment). Also, he attacks this concocted notion with such emotional vehemence that it’s clear he considers it a threat to the centralized EDW paradigm upon which he has built his career and reputation.

For starters, his definition of this concept is oddly vague and questionably narrow: “a virtual data warehouse occurs when a query runs around to a lot of databases and does a distributed query.” Essentially, Inmon defines “virtual DW” as the ability to a) farm out a query to be serviced in parallel by two or more distributed databases, b) aggregate and join results from those databases, and c) deliver a unified result set to the requester.

That’s an important query pattern, but not the only one that should be supported under (pick your quasi-synonym) data federation, data virtualization, or enterprise information integration (EII) architectures. Inmon’s definition excludes the many federated queries that may only hit on a single database, with no joins and results aggregation, and with the EII fabric handling the necessary on-demand transformation from that source’s schema to an abstract semantic model.

Per my data federation report from last fall, Forrester has a broader perspective on the topic than does Mr. Inmon. Data federation is any on-demand approach that queries information objects from one or more sources; applies various integration functions to the results; maps the results to a source-agnostic semantic-abstraction model; and delivers the results to requesters. Nothing in the scoping of data federation necessarily requires the multi-source aggregation and joining that Inmon puts at the heart of “virtual DW.”

Putting Inmon’s narrow scoping of “virtual DW” behind us for the moment, let’s consider his chief objections to this approach. First, it requires the “analyst to integrate data” (as if that’s something analysts are ill-suited for or regard as some inordinate burden). Second, it consumes resources, experiences suboptimal performance, and “shuffles a lot of data around the system that otherwise would not need to be moved” (as if centralized DWs don’t consume resources, experience performance bottlenecks, and move data). Third, it is “limited to the [historical] data found in the [source] databases.” Fourth, it suffers from “no reconcilability of data...[hence] no single version of the truth for the corporation.”

It’s a fairly straightforward matter to dispatch these objections:

First, data integration--through ETL, EII, and other approaches--is a core job function for DW professionals, not some alien function outside their core competency.

Second, data federation is often the optimal approach for low-latency BI (just check out the case studies in my data federation and really urgent analytics reports). Federated environments can be tuned to provide top-notch performance and minimize source-system impacts when “shuffling” data around in a decentralized fabric.

Third, the source databases in a federation environment often include DWs, which, per their core function, usually manage a considerable amount of historical data. Once again, see my data federation report with discussion of case studies for a) Federation of Local DWs via Centralized EII Infrastructure and b) Federation of Dispersed EDW and ODS Data Into Siloed BI Environments.

Fourth, data federation is not totally incompatible with data reconciliation. In fact, federation environments can be architected for single version of the truth, data governance, and master data management. However, it can indeed be tricky to manage data quality in federated environments (see Rob Karel’s coverage of MDM and DQ for a deep dive on that issue).

My basic objection to Inmon’s line of discussion is that he treats data federation as mutually exclusive from the enterprise DW (EDW), when in fact they are highly complementary approaches, not just in theory but in real-world deployments. Yes, data federation can be deployed as an alternative to traditional EDWs, providing direct interactive access to online transactional processing (OLTP) data stores. However, data federation can also coexist with, extend, virtualize, and enrich EDWs, as well as other data-persistence nodes such operational data stores (ODS) and online analytical processing (OLAP) data marts. The case studies in the cited reports bear that out.

Inmon’s arguments are worth consideration. The centralized EDW model he touts is useful for illuminating some traditional best practices. But by no means can it do justice to the stubbornly heterogeneous, distributed, mixed-latency BI and DW requirements of most enterprises.

March 22, 2009

After So Many Years Of Ballyhoo, Semantic Web Still Searching For Killer App

James-KobielusBy James Kobielus

Cynics might call Semantic Web a technology looking for a solution. And they might have a point.

Semantic Web refers to a long-running World Wide Web Consortium (W3C) initiative that is working toward an ambitious--some might say hopelessly Utopian--goal. At heart, it is a vision for how the World Wide Web should evolve to realize its full interoperability potential.

People vary widely in how they interpret the scope of the Semantic Web initiative. The tech industries are swarming with a wide range of projects, products, and tools that implement different variants of this vision. What vision is that? In the broadest sense, Semantic Web refers to an all-encompassing metadata, description, and policy layer that can, potentially, support universal, automatic, comprehensive end-to-end interoperability across every macro or micro entity—including data, components, services, applications, and services—on every conceivable level.

Whew!!! If that’s not the working definition of “pie in the sky” or “boil the ocean” (pick your metaphor), I don’t know what is. In fact, I’m hard-pressed to refer to Semantic Web as a definable market or solution segment. However, it’s not entirely vacuous.

For starters, organizations can implement W3C-developed semantic description standards—such as Resource Description Framework (RDF) and Web Ontology Language (OWL)--to make the meaning of content unambiguously comprehensible to services, applications, bots, and other automated components. Second, there is a reasonably robust market for “ontology” tools to support RDF/OWL-based modeling of application semantics. Finally, there is some incremental adoption of these tools and concepts in established IT segments, such as:

  • Enterprise content management (ECM): Semantic approaches can support more powerful discovery, indexing, search, classification, commentary, and navigation across heterogeneous stores of unstructured and semi-structured content. Semantic search—driven by concepts, not mere text strings--is regarded by some as a primary Semantic Web application. Indeed, many Semantic Web vendors are primarily implementing the technology in search engines that leverage ontology-based concepts to improve search accuracy and reduce spurious hits.

  • Enterprise information integration (EII):Semantic approaches enable consolidated viewing, query, and update of structured data that has been retrieved from diverse sources. Indeed, most commercial EII environments present an abstract semantic layer that mediates access to heterogeneous data, such as enterprise resource planning and customer relationship management applications, converging it all to a common presentation-side schema. A handful of those EII vendors have begun to support Semantic Web standards, primarily through third-party software plug-ins

  • Enterprise service bus (ESB):Semantic approaches can facilitate multilayered application, process, and service interoperability across disparate environments. To date, there has been little production implementation of Semantic Web standards in the ESB arena, though some vendors have adopted semantics, ontologies, and RDF to describe the conceptual models implemented by application endpoints, agents, and intermediary nodes within ESB-like middleware approaches such as event stream processing.

But Semantic Web approaches are still on the periphery of these markets. 10+ years into its inception, Semantic Web still has no clear killer app. It’s not clear if or when that app will emerge.

March 20, 2009

Lean Information Management Strategies for Lean Times

James-KobielusBy James Kobielus

When the going gets tough, the tough get lean, focused, and flexible. To help organizations survive the bad times and thrive in all climates, their information management initiatives must remain agile and adaptable.

If you feel your information management strategy is anything but lean, you’re not alone. Many organizations struggle to gain control over information infrastructures that have become too bloated, rigid, and slow to realign with new business drivers.

Lean information management practices are essential for corporate survival. They are far more than belt-tightening exercises. They also help you build analytic muscle for excelling in any business environment. Here are some basic pointers for keeping your information management strategy lean:

  • Trim your information infrastructure of excess cost. Lean means you should cut excessive, budget-busting overhead from your information management environment. Careful cuts are best, because they optimize your existing operations without gutting the core information, analytics, and applications that underpin your core competencies. Silo, server, database, and application consolidation should be your principal approaches. Also, you should re-evaluate vendor-sourcing strategies and renegotiate licenses at more favorable terms. And you should investigate lower-cost alternatives, such as software-as-a-service, to address business intelligence, business performance solutions, enterprise data warehousing, master data management, enterprise content management, and other information management requirements.
  • Fit information initiatives to key business imperatives. Lean also means you fit, focus, and fully align your information management initiatives to mission-critical business imperatives. Strategic alignment ensures that you leverage information assets across diverse application domains and business processes, rather than allow that intelligence to languish underutilized in silos. To sustain this approach, you should establish an information management framework, such as a Business Intelligence Solution Center, that enables ongoing collaboration between business and IT stakeholders. You should engage all key business and technical groups in information management planning discussions.
  • Flex information architectures to changing circumstances. Finally, lean means maintaining an approach that is flexible and adaptable, able to shift course as your needs and environment change. In yoga terms, lean is all about building, toning, and stretching analytical muscle to keep it from tearing when you need to transition rapidly from one strategic alignment to the next. You need the flexibility to swing between centralized information management infrastructures and decentralized or federated environments. For end-to-end data management environments, Forrester has developed an architecture decision support tool that helps information managers to determine which of several topologies is best suited to their needs: centralized enterprise data warehouse, hub-and-spoke, independent data marts, data federation, and information-as-a-service.

Considered as a comprehensive strategy, these lean practices are true bloat-busters and recession-beaters. They allow organizations to deliver practical insights that address all pain points, even--especially!!!--within strict budgets.

March 18, 2009

Are There BI Implications In The Rumored IBM/Sun Merger? You Betcha!

Boris-Evelson By Boris Evelson

I always predicted that Open Source BI has to reach critical mass before it becomes a viable alternative for large enterprise BI platforms. All the individual components (a mixture of Open Source BI projects and commercial vendor wrappers around them) are slowly but surely catching up to their bigger closed source BI brothers. Talend and Kettle (a Pentaho led project) offer data integration components like ETL, Mondrian and Palo (SourceForge projects) have OLAP servers, BIRT (an Eclipse project), Actuate, Jaspersoft and Pentaho have impressive reporting components, Infobright innovates with columnar dbms well suited for BI, and productized offerings from consulting companies like European based Engineering Ingegneria Informatica – SpagoBI – offer some Open Source BI component integration.

However, even large closed source BI vendors that acquired multiple BI components over the years still struggle with full, seamless component integration. So what chance do Open Source BI projects and vendors with independent leadership structure and often varying priorities have for integrating highly critical BI components such as metadata, data access layers, GUI, common prompting/sorting/ranking/filtering approaches, drill-throughs from one product to another, etc? Today, close to none. However, a potential consolidation of such products and technologies under one roof can indeed create a highly needed critical mass and give these individual components a chance to grow into large enterprise quality BI solutions.

Who are the potential consolidators? Red Hat with its JBoss and Metamatrix, critical BI integration components would make sense. And/or Sun with its GlassFish app server, NetBeans integration components, and MySQL, a small, but growing option for DW platform would probably make an even better acquirer. Now, the recent rumor that IBM may be in M&A talks with Sun is throwing a wrench into my well oiled engine of prediction logic. This would be an interesting twist with lots of implications for IBM such as reconciling its WebSphere line of products with Sun’s GlassFish and NetBeans, and reconciling its InfoSphere line of products with Sun’s MySQL

If, and only if, IBM decides that such two-pronged product strategy – open and closed source – makes sense for them, then I can theoretically see IBM becoming the consolidator for Open Source BI products. It can then potentially leverage its resources and subject matter expertise from Cognos to build up and position both open source and closed source BI offerings targeted at specific client bases. But the challenges of developing, marketing, positioning and selling two families of highly overlapping and competing BI products will be huge!

If such future is not in IBM plans, then my hopes for the bright Open Source BI future and bets are on Red Hat.

February 10, 2009

What, If Anything, Is A "Niche Vendor," Where Enterprise Data Warehousing Is Concerned?

Jameskobielus By James Kobielus

Now that we've published my Forrester Wave for Enterprise Data Warehousing (EDW) Platforms, you'd think I can breathe easier. Far from it. No matter how carefully one words a report, there is always the potential for misunderstanding. I'm already seeing some of that surrounding the notion of what, exactly, constitutes an EDW "niche vendor."

For starters, that term--"niche vendor"--is not in my vocabulary, and not in my Wave.  In the Wave, I used the standard Forrester methodology, which, based on transparent criteria and evaluation scores, distinguishes among "Leaders," "Strong Performers," "Contenders," and "Risky Bets." Rest assured that all seven of the vendors I evaluated--Teradata, Oracle, IBM, Microsoft, SAP, Netezza, and Sybase, are either "Leaders" or "Strong Performers."

We have no formalized definition of "niche vendors" in the Wave. Instead, all of the vendors in my Wave should be understood as "enterprise" data warehousing platform providers. The qualifier "enterprise" indicates that they are all addressing a wide range of enterprise information and knowledge management (I&KM) requirements for data warehousing. However, some of them are better positioned at this time to target a broader addressable market than are others, as evidenced by the details of their current offerings, strategies, and market presence. The vendors that are addressing the widest range of EDW marketplace requirements and opportunities scored higher in the Wave.

I think the crux of the misunderstanding lies in my acknowledgement that there are in fact "niche" segments of the EDW platforms market, and that some vendors have differentiated themselves well in those niches without, necessarily, being locked into them permanently. I refer to "niche markets," "niche solutions," and "niche deployments," but never "niche vendors." I do use "niche player" at one point, but that's to reflect a vendor's strategy, not its destiny.

To reflect that nuanced understanding, I placed the following qualifying language at the intro to the "Strong Performers" section:

  • "Strong Performers have proven themselves in particular niches, primarily among large enterprises but also in a growing range of midmarket deployments. These vendors' substantial, loyal, and longtime customer bases suggest plenty of opportunity for well-differentiated niche      products in the multifaceted and innovative EDW platforms market. I&KM professionals can rest assured that these and other substantial EDW platform vendors have the staying power, resources, and vision to weather the ups and downs in today’s turbulent IT market.

What exactly, then, is an EDW "niche solution"? Actually, before I answer that, let's discuss what's not a niche solution. Essentially, any solution portfolio that is well-suited to addressing the broadest range of EDW requirements--and in fact has production customers to demonstrate a vendor's success at doing just that--is the polar opposite of a niche solution.

In order to be "well-suited" in this regard, an EDW solution portfolio should have the comprehensive functionality, flexibility, scalability, and affordability to qualify for short-listing by I&KM professionals with the broadest range of requirements. More than that, the vendors should demonstrate considerable success in selling their solutions into the full range of customer size classes, verticals, and geographies.

It's one thing to state, in the abstract, that one's EDW solution portfolio has universal appeal, but quite another to demonstrate that a critical mass of real customers across all segments have found it appealing enough to put their money down and standardize on it. A vendor's pricing, licensing, packaging, sales, marketing, distribution, support, and professional services are critically important for them to achieve this degree of universal--or at least widespread--adoption. Also, sometimes, what holds a vendor back from broad appeal is a marketplace perception issue that may be several years out of date, but is still a tangible competitive handicap.

One way of interpreting the Wave is that the higher-scoring vendors have the least "niche-y" solutions on the market. Of course, a niche may be a large one, as measured by the number of actual or potential customers, but it's still a potential competitive handicap if a vendor is having difficulty breaking out of it--or doesn't realize it hampers their growth prospects. And a niche may be a matter of a vendor's sales strategy--e.g., selling their DW appliance primarily as an OLAP data-mart accelerator--that has paid off in sales momentum but is becoming a confining pigeonhole. Or the niche may be an architectural specialty--such as a columnar database--that has great strengths for particular EDW-node deployment roles but may be suboptimal for other roles.

Sometimes, vendors position their niche approach as the future of the market as a whole, and as the answer to every EDW requirement that every user might have. And, sometimes, the market disagrees, as expressed through customer demand, or the lack thereof, leaving vendors mystified as to why they're not becoming the pre-eminent market leader.

And sometimes, an emerging niche (i.e., vendor growth-potential-limiting constraint) may not be apparent to the vendors that, heretofore, have assumed that it constitutes the entire EDW market. One such emerging niche is for EDW solutions that have not yet attained petabyte-scale in production customer deployments, in demo environments, or in the lab. In fact, that niche includes the majority of today’s EDW solutions, and the vast majority of I&KM requirements. Some vendors (read the Wave to see who) have moved beyond that sub-petabyte niche, or are just now traversing that threshold, or are soon likely to. Interestingly, most vendors in the EDW Platforms Wave offer a credible case that they'll soon attain full petabyte-scalability, but only a few had actual customer deployments showing that they're already there.

But none of this is to be read as vendor destiny. The Wave also scores the vendor's corporate and product directions, and their momentum in selling into customer-size, vertical, and geographic segments outside their installed base. All of this is to be understood as a vendor attempting to break out of whatever niche(s) its solutions may be concentrated in.

And, indeed, that's a key take-away from the Forrester Wave for EDW Platforms. All seven of these vendors are rapidly evolving out of the various niches in which their solutions have been deployed. That includes petabyte-scalability. Consequently, you shouldn’t assume--simply because a vendor didn’t demonstrate "well-beyond" one-petabyte scalability for the purpose of gaining a "5" on that Wave criterion in Q1 09--that the vendor won’t able to demonstrate that capability for you, in their lab, next week.

The EDW market is evolving extraordinarily fast. Clearly, we’ll need to update the EDW Platforms Wave in the coming year or so to keep pace.

February 06, 2009

The Forrester Wave™: Enterprise Data Warehousing (EDW) Platforms Q1 2009: The Key Takeaway

Jameskobielus By James Kobielus

Today we published the first Forrester Wave™ specifically focused on Enterprise Data Warehousing (EDW) Platforms. The final published report is now available on Forrester’s website to clients. Information and knowledge management (I&KM) professionals will find it a timely and actionable study of the leading EDW platform vendors: Teradata, Oracle, IBM, Microsoft, SAP, Sybase, and Netezza. I urge you to download and read it, and then engage me, the author-analyst, in inquiries and advisories to help you apply it to your EDW initiatives.

The key takeaway from this Wave is that scalability, flexibility, and affordability are the dominant requirements in today’s budget-stressed EDW platforms market. I&KM professionals are under the gun, trying to keep EDW and business intelligence (BI) costs under tight control while preserving the flexibility to grow and repurpose these investments to support an ever-changing array of decision-support requirements. Hence, an EDW platform--to score well in the Wave--should address the following high-bar requirements:

  • Extremely scalable: The EDW platform should be scalable to support petabytes of usable data; thousand-plus distributed compute/storage nodes; tens of thousands of concurrent users and queries; many terabytes of daily or continuous data loads; and expanding mixed workloads of reporting, query, OLAP, in-database analytics, real-time analytics, ETL, data cleansing, and other transactions. It should support this extreme scalability through scale-out, shared-nothing MPP, optimized appliances, optimized storage, dynamic query optimization, and mixed workload management technologies.
  • Extremely flexible: The EDW platform should be flexible to support diverse applications, including business intelligence, online analytical processing, data mining, predictive analytics, text      analytics, closed-loop business process management, and complex event processing; and various deployment roles, including multi-domain data hubs, subject-specific data marts; operational data stores, master data management hubs, staging nodes, analytic data marts, multi-temperature      hierarchical storage management and archiving, and source and/or target repository in data federation environments. It should support this extreme flexibility by being fluid, adaptive, and virtualized; enabling data to be transparently persisted, in diverse physical and logical formats, to an abstract, seamless grid of interconnected memory and disk resources; and delivered with subsecond delay to consuming applications; and ensuring application service levels through an end-to-end, policy-driven, latency-agile, distributed-caching and dynamic query-optimization memory grid.
  • Extremely affordable: The EDW platform should be affordable for all customer segments and use cases. It should support this extreme affordability through flexible packaging/pricing, including      licensed software, modular appliances, and “pay as you go” subscription-based SaaS/cloud offerings.

EDW platforms vendors that can’t address these key requirements--now or in their enhancement road maps over the coming 2-3 years--will not survive in this very competitive arena.

As noted above and in my blog post last week, scalability, performance, and optimization are perhaps the most important criteria in today’s EDW market. And, of course, they are quite difficult to nail down into a single yardstick that does justice to different vendors’ approaches. Nevertheless, I believe this Wave accomplishes that. I have boiled down “scalability, performance, and optimization” (SPO) into a single criterion that defines five profiles (from 5= most scalable to 1 = least scalable), focusing on the degree of parallelism in the underlying architecture.

For each of the vendors in this Wave, I got a deep dive on their SPO architecture, but I didn’t stop there. I asked each vendor for reference customers, and conducted a structured interview with each. I asked each for a list and description of their largest production customer deployments. And I asked each for published benchmarks, plus all the supporting info on how the test environments, scenarios, and criteria. In other words, I applied the standard Forrester Wave methodology.

Essentially, the customer deployment and benchmark data corroborated whether a vendor in fact earned the particular SPO score associated with their architectural approach. Clearly, there were plenty of gray areas. Also, quite clearly, vendors had plenty of comments on the definitions of the SPO scales, and on where they fell on this spectrum. And, of course, many pointed out that being scored, say, a “2” rather than a “4” or “5” didn’t necessarily mean they were slower, less efficient, or incapable of processing various EDW and BI workloads. It also didn’t mean that they couldn’t, in practice and in customer deployments, push the scalability and speed envelope that one would associate with their architecture. Architecture isn’t destiny, but it definitely sets SPO constraints, which is the whole point of the scoring on this criterion in this Wave.

All the vendor feedback was excellent and helped me tweak and tune the scale to fit the EDW market’s current and emerging state of the art. With that said, here are the final SPO scales in this Wave:

  • 5 = scale out through shared-nothing massively parallel processing (MPP), up to 100-1000+ storage/compute nodes in single-tier grid of compute/storage nodes, and well beyond 1000s of terabytes (TBs) of online, usable production data across distributed deployment
  • 4 = scale out in the storage tier to 100-1000+ nodes and/or up to around 1000 TBs of online, usable production data, but lacking support for single-tier-grid shared-nothing MPP and/or lacking the ability to scale out to 100-1000+ nodes in the compute tier
  • 3 = scale-out through shared-nothing MPP and/or clustering, up to 2-100 storage and/or compute nodes and up to 100s of TBs of online, usable production data across distributed deployment
  • 2 = scale-up through symmetric multiprocessing (SMP), and up to 10s of TBs of online, usable production data, and scale-out in a clustered deployment of 2-99 compute nodes
  • 1 = scale-up through SMP and up to 10s of TBs of online, usable production data on a single-node deployment

To see how the vendors ranked, you’ll need to read the Wave. Or engage me in an inquiry or advisory. Or, preferably, both.

Enter your email address:

Delivered by FeedBurner

Search this blog