Informatica World wrapped up in San Francisco last week where almost 3,000 customers and partners gathered in the Moscone West conference center for four days packed with executive keynotes, customer and partner presentations. Based on my time there it’s clear that:
Informatica is pivoting to cater to a business audience. They recognize the business and their requirements have gained greater influence over technology purchasing decisions and are responding accordingly. Heralding what they call the age of data 3.0 they now want to leverage their leadership position in data management to build industry solutions on top of their data integration, data quality and data management tools. MDM solutions like MDM-Customer 360, MDM-Product 360, and MDM-Supplier 360 take aim at delivering mission critical insights to the business user. Their expanded partnership with Tableau will also continue to expose them to business audiences.
Promising new executives have their work cut out for them. Informatica has a 20 year track record of success in data management. But they are going in a new direction that is largely uncharted territory for them. Lou Attanasio, is the newly minted Chief Sales Officer who will need to transform an organization accustomed to speaking with IT to one that appeals to a business audience which will require a new sales model, training, and specialized sales talent that can speak to the client in terms of business value while also covering the technology at the right altitude. Jim Davis, who joined earlier this year as CMO from SAS, is leading the charge in positioning Informatica as not just a data management tool but a platform that is embracing cloud, mobile, social, big data, IoT and security.
Last week, I participated in a roundtable during a conference in Paris organized by the French branch of DAMA, the data management international organization. During the question/answer part of the conference, it became clear that most of the audience was confusing data management with data governance (DG). This is a challenge my Forrester colleague Michele Goetz identified early in the DG tooling space. Because data quality and master data management embed governance features, many view them as data governance tooling. But the reality is that they remain data management tooling — their goal is to improve data quality by executing rules. This tooling confusion is only a consequence of how much the word governance is misused and misunderstood, and that leads to struggling data governance efforts.
So what is “governance”? Governance is the collaboration, organization, and metrics facilitating a decision path between at least two conflicting objectives. Governance is finding the acceptable balance between the interests of two parties. For example, IT governance is needed when you would like to support all possible business projects but you have limited budget, skills, or resources available. Governance is needed when objectives are different for different stakeholders, and the outcome of governance is that they do not get the same priority. If everyone has the same objective, then this is data management.
You’ve heard it before but we said it again – this time in our recent webinar. There's a new kid in town: the chief data officer. Why the new role? Because of an increasing awareness of the value of data and the painful recognition of an inability to take advantage of the opportunities that it provides — due to technology, business, or basic cultural barriers. That was the topic of our webinar presented to a full house a few days ago; we discussed our recent report, Top Performers Appoint Chief Data Officers. Fortunately for those who weren’t there, the presentation – Chief Data Officers Cross The Chasm – is available (to clients) for download.
As the title suggests, chief data officers are no longer just for the early adopters – those enthusiasts and visionaries on the forefront of new technology trends. With 45% of global companies having appointed a chief data officer (not to be confused with a chief digital officer, as we specifically asked about “data”) and another 16% planning to make an appointment in the next 12 months – according to Forrester's Business Technographics surveys, the role of the chief data officer really has move into the mainstream.
However, there remain many companies who are not sure of whether they need a CDO or not. Many of those in our audience fell into that category. We asked two questions of the audience to gauge their interest and their actions to improve their data maturity:
Are you making organizational changes specifically to improve your data capabilities?
Gene Leganza and I just published a report on the role of the Chief Data Officer that we’re hearing so much about these days – Top Performers Appoint Chief Data Officers. To introduce the report, we sat down with our press team at Forrester to talk about the findings, and the implications for our clients.
Forrester PR: There's a ton of fantastic data in the report around the CDO. If you had to call out the most surprising finding, what would top your list?
Gene: No question it's the high correlation between high-performing companies and those with CDOs. Jennifer and I both feel that strong data capabilities are critical for organizations today and that the data agenda is quite complex and in need of strong leadership. That all means that it's quite logical to expect a correlation between strong data leadership and company performance - but given the relative newness of the CDO role it was surprising to see firm performance so closely linked to the role.
Of course, you can't infer cause and effect from correlation – the data could mean that execs in high-performing companies think having a CDO role is a good idea as much as it could mean CDOs are materially contributing to high performance. Either way that single statistic should make one take a serious look at the role in organizations without clear data leadership.
It’s not news that business user self-service for access to information and analytics is hot. What might not be as obvious is the overhaul of information-related roles that is happening now as a result. What’s driving this? The hunger for data (big, fast, and otherwise) to feed insights, very popular data visualization tools, and new but rapidly spreading technology that puts sophisticated data exploration and manipulation tools in the hands of business users.
One impact is that classic tech management functions such as data modeling and data integration are moving into business-side roles. I can’t help but be reminded of Bill Murray’s apocalyptic vision from “Ghostbusters:” “Dogs and cats, living together… mass hysteria!” Is this the end of rational, orderly data management as we know it? Haven’t central tech management organizations always seen business-side tech decision-making (and purchasing, and implementation) as “rogue” behavior that needed to be governed out of existence? If organizations have trouble now keeping data for analytics at the right level of quality in data warehouses, won’t all this introduction of new data sources and data lakes and whatnot just make things worse?
Well, my answers are “no,” “yes,” and “no” in that order. The big changes that are afoot are not the end of order and even though “business empowerment” translates to “rogue IT” in some circles, data lakes/hubs and the infusion of 3rd party data have actually been delivering on their promise of faster, better business insights for the organizations doing it right.
By now you have at least seen the cute little elephant logo or you may have spent serious time with the basic components of Hadoop like HDFS, MapReduce, Hive, Pig and most recently YARN. But do you have a handle on Kafka, Rhino, Sentry, Impala, Oozie, Spark, Storm, Tez… Giraph? Do you need a Zookeeper? Apache has one of those too! For example, the latest version of Hortonworks Data Platform has over 20 Apache packages and reflects the chaos of the open source ecosystem. Cloudera, MapR, Pivotal, Microsoft and IBM all have their own products and open source additions while supporting various combinations of the Apache projects.
After hearing the confusion between Spark and Hadoop one too many times, I was inspired to write a report, The Hadoop Ecosystem Overview, Q4 2104. For those that have day jobs that don’t include constantly tracking Hadoop evolution, I dove in and worked with Hadoop vendors and trusted consultants to create a framework. We divided the complex Hadoop ecosystem into a core set of tools that all work closely with data stored in Hadoop File System and extended group of components that leverage but do not require it.
In the past, enterprise architects could afford to think big picture and that meant treating Hadoop as a single package of tools. Not any more – you need to understand the details to keep up in the age of the customer. Use our framework to help, but please read the report if you can as I include a lot more there.
When it comes to data technology, are you lost in translation? What's the difference between data federation, virtualization, and data or information-as-a-service? Are columnar databases also relational? Does one use the same or different tools for BAM (Business Activity Monitoring) and for CEP (Complex Event Processing)? These questions are just the tip of the iceberg of a plethora of terms and definitions in the rich and complex world of enterprise data and information. Enterprise application developers, data, and information architects manage multiple challenges on a daily basis already, and the last thing they need to deal with are misunderstandings of the various data technology component definitions.
The tide is turning on privacy. Since the earliest days of the World Wide Web, there has been an increasing sense that the Internet would effectively kill privacy – and in the wake of the NSA PRISM program revelations, that sentiment was stronger than ever. However, by using our Forrester’s Technographics 360 methodology, which blends multiple qualitative and quantitative data sources, we found that attitudes on privacy are evolving: Consumers are beginning to shift from a state of apathy and resignation to caution and empowerment.
In our recently published report, we integrate Forrester's Consumer Technographics® survey data, ConsumerVoices Market Research Online Community qualitative insight, and social listening data to provide a holistic view of the changes in consumer perceptions and expectations of data privacy. In the past year, individuals have 1) become much more aware about the ways in which organizations collect, use, and share personal data and 2) have started to change their online behavior in response:
No self-respecting EA professional would enter into planning discussions with business or tech management execs without a solid grasp of the technologies available to the enterprise, right? But what about the data available to the enterprise? Given the shift towards data-driven decision-making and the clear advantages from advanced analytics capabilities, architecture professionals should be coming to the planning table with not only an understanding of enterprise data, but a working knowledge of the available third-party data that could have significant impact on your approach to customer engagement or your B2B partner strategy.
Data discussions can't be simply about internal information flow, master data, and business glossaries any more. Enterprise architects, business architects, and information architects working with business execs on tech-enabled strategies need to bring third-party data know-how to their brainstorming and planning discussions. As the data economy is still in its relatively early stages and, more to the point, as organizational responsibilities for sourcing, managing, and governing third-party data are still in their formative states, it behooves architects to take the lead in understanding the data economy in some detail. By doing so, architects can help their organizations find innovative approaches to data and analytics that have direct business impact by improving the customer experience, making your partner ecosystem more effective, or finding new revenue from data-driven products.
Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data. Forrester believes that Hadoop will become must-have infrastructure for large enterprises. If you have lots of data, there is a sweet spot for Hadoop in your organization. Here are five reasons firms should adopt Hadoop today:
Build a data lake with the Hadoop file system (HDFS). Firms leave potentially valuable data on the cutting-room floor. A core component of Hadoop is its distributed file system, which can store huge files and many files to scale linearly across three, 10, or 1,000 commodity nodes. Firms can use Hadoop data lakes to break down data silos across the enterprise and commingle data from CRM, ERP, clickstreams, system logs, mobile GPS, and just about any other structured or unstructured data that might contain previously undiscovered insights. Why limit yourself to wading in multiple kiddie pools when you can dive for treasure chests at the bottom of the data lake?
Enjoy cheap, quick processing with MapReduce. You’ve poured all of your data into the lake — now you have to process it. Hadoop MapReduce is a distributed data processing framework that brings the processing to the data in a highly parallel fashion to process and analyze data. Instead of serially reading data from files, MapReduce pushes the processing out to the individual Hadoop nodes where the data resides. The result: Large amounts of data can be processed in parallel in minutes or hours rather than in days. Now you know why Hadoop’s origins stem from monstrous data processing use cases at Google and Yahoo.