Canonical Information Modeling - A Best Practice For SOA?

I recently attended the second annual “Canonical Model Management Forum” at the Washington Plaza Hotel in Washington, DC (see here for my post about last year’s, first meeting, including Forrester’s definition of canonical modeling). Enterprise or information architects from a number of government agencies as well as several of the major banks, insurance companies, retailers, credit-card operators, and other private-sector firms attended the meeting. There was one vendor sponsor (DigitalML, the vendor of IgniteXML). There were a number of presentations by the attendees about their environments, what had motivated them to establish a canonical model, how that work had turned out, and the important lessons learned.

Last year I also had some recent Forrester survey results to share – we have not yet rerun that survey, but we are on the verge of rerunning it, so I’ll post some key results from that once the data is available.

Last year’s post is still the place to go to get the general overview about why to do canonical modeling, the main use cases, some areas of controversy (still raging), and a list of best practices I heard attendees agree upon.

What’s New In 2011?

Based both on what I heard at this meeting and on other recent interviews:

  • Canonical modeling is becoming more common and is delivering more value. I see this not only in the growth of DigitalML’s customer base but also in the increasing number of organizations I’ve interviewed that are implementing canonical information models as part of their data integration, application integration, B2B integration, or data services layer implementation, using a wide range of tools. So I predict our survey data will show this too – I wish I had it now!

The main motivation for canonical models is still to increase reuse of shared services, also making them easier and faster to consume. One large oil company I interviewed recently has been measuring this since going live in 2009 and found that 40% of new requests for access to data can now be satisfied by an existing data service. At this year’s CMM Forum, Novartis presented results of its canonical modeling efforts since early 2009 showing reuse ranging from 20% on early projects up to 100% on later projects in domains for study management, drug delivery, and customer master.

  • Tool support is even more important to make the effort scalable and cost-effective. Creating the initial model is not that hard compared with the effort to spawn and manage the many physical XSDs or other artifacts derived from it. Managing change over a population of schemas linked to a canonical model that is in turn linked to multiple industry standards that are in a constant state of evolution requires significant, error-prone manual effort, without tool support. Tool features that support mapping the dependencies between these various models while linking to a common vocabulary, taking subsets to limit change impact, dividing models into multiple federated domains, and interfacing with other modeling tools and metadata/services repositories are the ones cited most often by customers as giving them significant help in sustaining their information architecture and canonical modeling programs.
  • Knowledgeable architects view canonical modeling as a best practice for SOA. While Doug Stacey of Allstate was presenting Allstate’s approach to implementing a federated canonical model, he made the statement that his company was able to obtain support for its information architecture program and canonical model from its Chief Architect by getting agreement that “If you’re going to do SOA, you’ve got to do something to control this [the data model].” The person sitting next to me, an architect from a large financial services company in the mortgage business, remarked under her breath, “who doesn’t?” Many heads around the room were nodding – the only area where any apparent differences emerged was regarding how far down into the domains one can reasonably push governance and control versus how light of a hand one may apply to be more pragmatic and focused more on connections between domains.
  • Federated models are becoming increasingly common. I have seen this pattern emerge in several recent examples, both at the forum (such as Allstate) and elsewhere. Federation is still largely a practice of large companies, but it seems to be the only way to make SOA or a canonical model scale to work over such a large and diverse enterprise (including in the federal space). The last time I did research on federated SOA, the federation capabilities of SOA registry/repositories were relatively immature, such that most of those doing federation had to stick with using the same repository everywhere to make it possible (federating across multiple instances). Now it appears to be possible to federate across a limited number of connection types among the leading registry/repositories (HP, IBM, and SAG CentraSite), making a more heterogeneous approach feasible.
  • A sizable proportion of canonical models are also supporting data access layers. When I gave my presentation at the CMM Forum, an update on the state of canonical modeling, I asked for a show of hands of how many of those present (perhaps 35 architects) were using their canonical model in conjunction with a service-oriented data access layer, and about 40% held up their hands. The Novartis presentation showed a specific example — a “virtual data layer offering data in Common Information Model.” At Forrester we’re currently updating our research on the data services market, and a number of the companies we’re interviewing about their implementations have made canonical models a key part of their strategy for data services.
  • “Big Data” lies on the horizon, offering great promise for more insights. Combining “Big Data” (e.g., Hadoop, or large warehouses – such as 150 billion rows – on Netezza appliances) with a data services layer that has a canonical model is still a rare and emergent practice, but I expect it to start growing slowly as a design pattern that will appeal to organizations using SOA that also have “Big Data” resources available. Most analytics use cases for “Big Data” will still access those resources directly, through APIs or frameworks, but by also “publishing” key analytical insights up through the data services layer, a much broader population can then take advantage of these assets than would otherwise be able to do so. Given that Customer is one of the most common entities found in data access layers (striving for the elusive “single view of the customer”), “Big Data” containing information about customer behavior via web, mobile, location-based, or smart-grid applications appears the most attractive for early exploitation.

But one might ask – what’s the model of the “Big Data”? The underlying data may be structured, semistructured, or unstructured (think Twitter streams), but once analytics extract insights from the data, a structure emerges that complements the canonical model. However, whereas the canonical model expresses a need to govern and standardize, the model of “Big Data” is often dynamic by nature, so don’t try to standardize it, except where particular insights such as phone calling behavior are reasonably stable over the life of a system. Insights from the Twitter stream will never have that kind of stability – think “trending topics.”

  • MDM golden masters are now more often becoming sources for data services. MDM initiatives are often painful and long, but when they finally begin to deliver results, these make excellent sources for data services and therefore play a key role in your canonical model. I’ve seen multiple examples of this recently — a few architects referred to this reuse of MDM data (golden masters that had been staged from original sources) in an almost offhand matter, but it was only one of several sources they were aggregating. There’s much more to MDM than just standardizing the model, but if your organization has gone to the trouble of doing the kind of quality scrubbing, de-duping, and information merging that MDM requires, it makes another great place to start in looking for sources for your canonical model, along with whatever industry standards for data exist in your industry context.

There’s much more I could say, but in the interest of time, I’ll stop there. The combination of canonical modeling, data services, and “Big Data” is generating a lot of activity and opportunities for innovation, so watch this space for more tidbits as they emerge. And if you’re a Forrester client and have a canonical modeling initiative you’re considering kicking off, please submit an inquiry to inquiry@forrester.com on this topic, and I’d be happy to discuss the details of your program. Be sure to send some basic info about what you’re doing as part of the inquiry so I can prepare to give you maximum value.

Comments

Would love to see PUBLISHED

Would love to see PUBLISHED resource on the pros/cons of all tools in the marketplace as to how they assist in managing canonical models over the long haul...

More research on canonical modeling tools

I'd like that too, James, but at the moment don't have as much resource as we'd like to be able to fully cover this area. However I do have some new data and other fresh primary research that I hope to have time to publish on in the near future.

Canonical Model and semantics

Is Forrester looking into the area of Canonical Model with Semantics? Are there any tool support out there?

Re: Canonical model & semantics

We are continuing to do a modest level of research in this area, but in matching resources to demand, are not able to justify doing more than we have so far, including:
1) Occasional blogging on the topic, as in this post
2) Inquiry support - I've handled half a dozen inquiries on this topic in the last few months
3) Ongoing field research - seeking out new solutions, investigating known solutions, and interviewing practitioners about how they are doing canonical modeling and model management.

For example when I was IBM IOD early last week, I dug around with various IBM'ers in the Information Management unit to verify that IBM still does not have a solution in this area. They have some elements that are adjacent - InfoSphere Data Architect for logical modeling, metadata repository, a glossary for individual attributes, and industry models in key areas like Insurance. But the core of what firms are seeking to create and manage canonical models - still no solution.

For more info on what tool support we've already found, see this and earlier blog posts from me on the topic. I'd love to have enough resources on my team to do a published Market Overview of this area, but have not been able to make that happen, yet.

Tools in the semantic modeling/canonical modeling space

One of the commenters asked for information about tools. While I am aware of the sensitivity of "advertising" in such a forum, I do want to mention the DXSI (Data eXtend Semantic Integrator) tool from my employer - Progress Software. It is referenced by IBM in one of the integration red books. I won't make product statements in here - but would encourage any interested parties to check it out.

Regards

Chris

Tools for canonical modeling

Hey, Chris, it's good to hear from you! (full disclosure - back in the dawn of time Chris and I both worked for LBMS, although at slightly different times in history).

I agree that Progress' DXSI is one of the good tools out there for this purpose. Another is DigitalML's IgniteXML.

Those two tools are the ones I see most often being used for this specific purpose, each with somewhat different features/functions and industry footprints - meaning, I don't run into that many people who considered both and chose one over the other - rather, folks tend to find a solution using one, or the other, and go down that road without much consideration of other alternatives.

The other main practice I've seen for canonical modeling is that folks doing data virtualization often use the tools built into their DV environment to build and manage a canonical model for the "access layer" they are creating using that environment. Here the "usual suspects" include Composite Software, Informatica, and Denodo most often - that is, I've interviewed a number of their customers who were using a canonical model approach.

Progress also has some customers who have done data virtualization, although Progress is not currently marketing a product specifically for that purpose, as far as I know. Those Progress customers who have done this are typically using DXSI in combination with the Progress ESB, with Java servlets running in containers provided by the ESB, each servlet providing a particular data service.

One of the issues that comes up most often around tooling for canonical modeling is that none of these tools is a complete solution, top to bottom, so folks typically combine them with other tools such as CA ERWin, IBM InfoSphere Data Architect, or Embarcadero, transferring metadata between them. This often requires some level of customization and integration work to be done, usually in partnership between the customer and their vendors and SIs.