Canonical Information Models Play Important Role In API Layers, Increasing Service Reuse

I attended the third annual Canonical Model Management Forum, May 14-15, 2012, hosted by DigitalML at the hip Washington Plaza Hotel again this year. I saw even more signs than last year that canonical models are key to API layers that many firms are building to promote integration. But first, let me share some data from Forrester’s last survey on canonical modeling adoption that I presented at the forum:

As our data shows, this practice is becoming increasingly widespread in firms publishing services that deliver information. Some of the trends I highlighted in last year’s forum post are even more evident this year, such as canonical models’ increasing use in data access layers, as I depicted with this slide:

About half of those attending this year’s forum (from large banks, insurance companies, government agencies, retailers, and others with big integration requirements) indicated that they have (or are building) layers like these and that their canonical models play a role as depicted above. REST is emerging as a preferred alternative to SOAP for web APIs, along with JSON as a data-delivery format (preferred to XML), which makes support for REST and JSON high-priority wish-list items for tools supporting canonical model management.

Heard At The Forum

Here are some (paraphrased) quotes from presentations firms gave about canonical modeling. First, from a large bank:

“We have a canonical model so we can more quickly integrate the systems of acquired banks.”

“To create it, we first tried importing a data-at-rest relational model from ERWin but found that created overly complex XML Schemas for our data-in-motion canonical model. The number of XSDs and the size of the XSDs were massive, and these were either inefficient or complex for developers to use in practice.”

“Once we broke up our model into smaller pieces designed more for data-in-motion, we are now able to support a large number of developers with just a few modelers using powerful tools. Business analysts, service architects, and integration developers work together to deliver services that put data in motion.”

Next, a large insurance company:

“Our catalyst for change was bringing in a new policy administration system for Group. We needed to rationalize 197 interfaces across 45 ‘supplier’ systems. We wanted one place to get customer data that would be easy to plug in and use.”

“We support both request/response and publish/subscribe interaction models through our message bus, with a standard message format defined from our canonical model.”

“This approach significantly sped up our project to deliver these interfaces, by at least 30%.”

“We use our canonical model to generate services for both private (internal) and public interfaces (exposed to other companies). We store metadata defining these interfaces in a registry/repository, where developers can search for services and see the data they provide.”

“We met with the business weekly or even bi-weekly when we were working on the canonical model to resolve questions about the data and understand how to map between our databases and the data structures in Acord.”

“We work in parallel with three Agile development teams of five to six people each, so the model keeps evolving. That means we have to do impact analysis before each change.”

Then, a large retailer ($25 B in revenue) with a significant eCommerce business:

“We have more than 800 applications, and we built more than 150 integrations using our canonical model. They run over an ESB built to support 10 million transactions per hour.”

“We are building a new order management system while retiring older ones. We have made this canonical pattern the default for integration. Our model is based on ARTS (Association for Retail Technology Standards) and OAGIS.”

“We started our model with fewer than 10 nouns, including sales order, purchase order, product, customer, etc. Now we’re up to about 20 nouns supporting 70+ canonical interfaces. Each canonical interface provides a ‘façade’ of the canonical noun.”

“Be sure to listen to developer feedback. When you produce something out of the model, is it easy for them to consume? Can they find and understand the schemas? Can they easily create Java classes (or whatever they need) from the schemas?”

“We apply Lean Thinking to our approach to canonical modeling. We model bottom-up, and we model ‘just in time’ so we’re not too far ahead of the use cases. We factor in many inputs to arrive at our view of the common model. But we’re very careful to minimize what goes in the common model, to keep it as simple and lean as possible while still being comprehensive enough to be broadly useful. We also minimize handoffs and maximize flow in our process.”

“Now that we’ve been using this approach for about a year in production, building more interfaces has led to a common pattern we’re seeing quite often, where two to three different applications access the same canonical interface, which in turn integrates with two to three ‘source’ systems underneath to provide the service. This drives significant reuse and faster integration.”

The Future Of Canonical Models

I spent time discussing the future of canonical modeling with the attendees, both during my own session and in later one-on-one or group conversations. My main takeaway? We need to think differently about canonicals when we apply them in the API/façade pattern scenario. The underlying canonical “noun” or “entity” will still have its broader, more reusable structure, with as many attributes as possible defined as optional.

Each API we provide to developers to access services then represents a façade on that underlying resource, with only as much information as is strictly necessary to meet the needs of the consuming application. APIs should be designed to optimize the application – I’ve heard them called “application-oriented services,” or AOS, as opposed to SOA. There’s a natural tension between this goal and the goal of reuse, canonicalization, and rationalization, but with the right layering, the two can be balanced. The key is building the canonical “noun” or “entity” layer in a way that eases the development, delivery, and maintenance of the façade APIs above it. This approach also provides opportunities for additional performance optimization using in-memory caching on the back end. Bottom line: APIs are just another case of “data in motion,” but canonical modeling practices must adapt to optimize for the API use case.

What do you think? I'd love to hear more about your experiences delivering APIs as a platform and your views on the relevance of canonical models to that use case.