Are Data Preparation Tools Changing Data Governance?

Michele Goetz

First there was Hadoop. Then there were data scientists. Then came Agile BI on big data. Drum roll, please . . . bum, bum, bum, bum . . .

Now we have data preparation!

If you are as passionate about data quality and governance and I am, then the 5+-year wait for a scalable capability to take on data trust is amazingly validating. The era for "good enough" when it comes to big data is giving way to an understanding that the way analysts have gotten away with "good enough" was through a significant amount of manual data wrangling. As an analyst, it must have felt like your parents saying you can't see your friends and play outside until you cleaned your room (and if it's anything like my kids' rooms, that's a tall order).

There is no denying that analysts are the first to benefit from data preparation tools such as Altyrex, Paxata, and Trifacta. It's a matter of time to value for insight. What is still unrecognized in the broader data management and governance strategy is that these early forays are laying the foundation for data citizenry and the cultural shift toward a truly data-driven organization.

Today's data reality is that consumers of data are like any other consumers; they want to shop for what they need. This data consumer journey begins by looking in their own spreadsheets, databases, and warehouses. When they can't find what they want there, data consumers turn to external sources such as partners, third parties, and the Web. Their tool to define the value of data, and ultimately if they will procure it and possibly pay for it, is what data preparation tools help with. The other outcome of this data-shopping experience is that they are taking on the risk and accountability for the value of the data as it is introduced into analysis, decision-making, and automation.

Read more

BI and data integration professionals face a multitude of overlapping data preparation options

Boris Evelson

Ah, the good old days. The world used to be simple. ETL vendors provided data integration functionality, DBMS vendors data warehouse platforms and BI vendors concentrated on reporting, analysis and data visualization. And they all lived happily ever after without stepping on each others’ toes and benefiting from lucrative partnerships. Alas, the modern world of BI and data integration is infinitely more complex with multiple, often overlapping offerings from data integration and BI vendors. I see the following three major segments in the market of preparing data for BI:

  1. Fully functional and highly scalable ETL platforms that are used for integrating analytical data as well as moving, synchronizing and replicating operational, transactional data. This is still the realm of tech professionals who use ETL products from Informatica, AbInitio, IBM, Oracle, Microsoft and others.
  2. An emerging market of data preparation technologies that specialize mostly in integrating data for BI use cases and mostly run by business users. Notable vendors in the space include Alteryx, Paxata, Trifecta, Datawatch, Birst, and a few others.
  3. Data preparation features built right into BI platforms. Most leading BI vendors today provide such capabilities to a varying degree.
Read more

3 Ways Data Preparation Tools Help You Get Ahead Of Big Data

Michele Goetz

The business has an insatiable appetite for data and insights.  Even in the age of big data, the number one issue of business stakeholders and analysts is getting access to the data.  If access is achieved, the next step is "wrangling" the data into a usable data set for analysis.  The term "wrangling" itself creates a nervous twitch, unless you enjoy the rodeo.  But, the goal of the business isn't to be an adrenalin junky.  The goal is to get insight that helps them smartly navigate through increasingly complex business landscapes and customer interactions.  Those that get this have introduced a softer term, "blending."  Another term dreamed up by data vendor marketers to avoid the dreaded conversation of data integration and data governance.  

The reality is that you can't market message your way out of the fundamental problem that big data is creating data swamps even in the best intentioned efforts. (This is the reality of big data's first principle of a schema-less data.)  Data governance for big data is primarily relegated to cataloging data and its lineage which serve the data management team but creates a new kind of nightmare for analysts and data scientist - working with a card catalog that will rival the Library of Congress. Dropping a self-service business intelligence tool or advanced analytic solution doesn't solve the problem of familiarizing the analyst with the data.  Analysts will still spend up to 80% of their time just trying to create the data set to draw insights.  

Read more