Ah, the good old days. The world used to be simple. ETL vendors provided data integration functionality, DBMS vendors data warehouse platforms and BI vendors concentrated on reporting, analysis and data visualization. And they all lived happily ever after without stepping on each others’ toes and benefiting from lucrative partnerships. Alas, the modern world of BI and data integration is infinitely more complex with multiple, often overlapping offerings from data integration and BI vendors. I see the following three major segments in the market of preparing data for BI:

  1. Fully functional and highly scalable ETL platforms that are used for integrating analytical data as well as moving, synchronizing and replicating operational, transactional data. This is still the realm of tech professionals who use ETL products from Informatica, AbInitio, IBM, Oracle, Microsoft and others.
  2. An emerging market of data preparation technologies that specialize mostly in integrating data for BI use cases and mostly run by business users. Notable vendors in the space include Alteryx, Paxata, Trifecta, Datawatch, Birst, and a few others.
  3. Data preparation features built right into BI platforms. Most leading BI vendors today provide such capabilities to a varying degree.

So how does a BI buyer looking for data integration capabilities decide between the three, frequently overlapping options? Here’s how I see the difference between simple data preparation, integration, wrangling (a new hot buzzword), blending (also a popular new term used by a few BI vendors) and full blown ETL. IMHO enterprise grade ETL must have all of the following capabilities:

  • Multi step transformations as in a) merge data set A with data set B into a data set C, b) aggregate data set C, c) merge C with D, etc. There also has to be a GUI that lets the data transformation developer sequence the steps in a certain order. Most of the data preparation features that come with BI platforms are limited to a single step / process
  • Not just column, but row based transformations such as pivoting data and/or deduplicating/merging rows. Most of the data preparation features that come with BI platforms are limited to column level (Column A + Column B = Column C), not row level operations.
  • Scripting language that extends SQL based transformation with IF/THEN/ELSE conditions, loops, etc. Most of the data preparation features that come with BI platforms are limited to SQL based operations
  • Out of the box connectivity to complex enterprise data sources. While all BI tools can import and query data from SQL databases and files (like spreadsheets) connetivity to enterprise ERP and CRM applications require special adapters.
  • Centralized policies and processes to ensure integrity, consistency and repeatability. Most of the BI platform data preparation capabilities are limited to one off use cases.

I recommend using these capabilities as a checklist to decide whether a BI tool will suffice or whether you need a more comprehsive data integration platform. Did I miss anything? Drop me a not if I did