Open data is critical for delivering contextual value to customers in digital ecosystems. For instance, The Weather Channel and OpenWeatherMap collect weather-related data points from millions of data sources, including the wingtips of aircraft. They could share these data points with car insurance companies. This would allow the insurers to expand their customer journey activities, such as alerting their customers in real time to warn them of an approaching hailstorm so that the car owners have a chance to move their cars to safety. Success requires making logical connections between isolated data fields to generate meaningful business intelligence.
But also trust is critical to deliver value in digital ecosystems. One of the key questions for big data is who owns the data. Is it the division that collects the data, the business as a whole, or the customer whose data is collected? Forrester believes that for data analytics to unfold its true potential and gain end user acceptance, the users themselves must remain the ultimate owner of their own data.
The development of control mechanisms that allow end users to control their data is a major task for CIOs. One possible approach could be dashboard portals that allow end users to specify which businesses can use which data sets and for what purpose. Private.me is trying to develop such a mechanism. It provides servers to which individual's information is distributed to be run by non-profit organizations. Data anonymization is another approach that many businesses are working on, despite the fact that there are limits to data anonymization as a means to ensure true privacy.
The business has an insatiable appetite for data and insights. Even in the age of big data, the number one issue of business stakeholders and analysts is getting access to the data. If access is achieved, the next step is "wrangling" the data into a usable data set for analysis. The term "wrangling" itself creates a nervous twitch, unless you enjoy the rodeo. But, the goal of the business isn't to be an adrenalin junky. The goal is to get insight that helps them smartly navigate through increasingly complex business landscapes and customer interactions. Those that get this have introduced a softer term, "blending." Another term dreamed up by data vendor marketers to avoid the dreaded conversation of data integration and data governance.
The reality is that you can't market message your way out of the fundamental problem that big data is creating data swamps even in the best intentioned efforts. (This is the reality of big data's first principle of a schema-less data.) Data governance for big data is primarily relegated to cataloging data and its lineage which serve the data management team but creates a new kind of nightmare for analysts and data scientist - working with a card catalog that will rival the Library of Congress. Dropping a self-service business intelligence tool or advanced analytic solution doesn't solve the problem of familiarizing the analyst with the data. Analysts will still spend up to 80% of their time just trying to create the data set to draw insights.
I’m ramping up to attend Strata in San Jose, February 18, 19 and 20th. Here is some info to help everyone who wants to connect and share thoughts. Looking forward to great sessions and a lot of thought leadership.
I’ll be setting aside some time for 1:1 meetings (Booked Full)
[Updated on 2/17] - I have set up some blocks of time to meet with people at Strata. Please follow the link below to schedule with me on a first come basis.
[Update] - I booked out inside 2 hours...didn't expect that! I may open up my calendar for more meetings but need to get a better bead on the sessions I want to attend first. Shoot to catch me at breakfast, will tweet out when I'm there.
I’ll be posting my thoughts and locations on Twitter
The best way to connect with me at Strata is to follow me on Twitter @practicingea.
You can post @ me or DM me. I’ll be posting my location and you can drop by for ad hoc conversations as well.
I’m very interested in your point of view - data driven to insights driven
I am concluding very quickly that “big data” as we have viewed it for the last five years is not enough. I see firms using words like “real-time” or “right-time” or “fast data” to suggest the need is much bigger than big data – its about connecting data to action in a continuous learning loop.
The battle of trying to apply traditional waterfall software development life-cycle (SDLC) methodology and project management to Business Intelligence (BI) has already been fought — and largely lost. These approaches and best practices, which apply to most other enterprise applications, work well in some cases, as with very well-defined and stable BI capabilities like tax or regulatory reporting. Mission-critical, enterprise-grade BI apps can also have a reasonably long shelf life of a year or more. But these best practices do not work for the majority of BI strategies, where requirements change much faster than these traditional approaches can support; by the time a traditional BI application development team rolls out what it thought was a well-designed BI application, it's too late. As a result, BI pros need to move beyond earlier-generation BI support organizations to:
Focus on business outcomes, not just technologies. Earlier-generation BI programs lacked an "outputs first" mentality. Those projects employed bottom-up approaches that focused on the program and technology first, leaving clients without the proper outputs that they needed to manage the business. Organizations should use a top-down approach that defines key performance indicators, metrics, and measures that align with the business strategy. They must first stop and determine the population of information required to manage the business and then address technology and data needs.
When you hear the term fast data the first thought is probably the velocity of the data. Not unusual in the realm of big data where velocity is one of the V's everyone talked about. However, fast data encompasses more than a data characteristic, it is about how quickly you can get and use insight.
Working with Noel Yuhanna on an upcoming report on how to develop your data management roadmap, we found speed was a continuous theme to achieve. Clients consistently call out speed as what holds them back. How they interpret what speed means is the crux of the issue.
Technology management thinks about how quickly data is provisioned. The solution is a faster engine - in-memory grids like SAP HANA become the tool of choice. This is the wrong way to think about it. Simply serving up data with faster integration and a high performance platform is what we have always done - better box, better integration software, better data warehouse. Why use the same solution that in a year or two runs against the same wall?
The other side of the equation is that sending data out faster ignores what business stakeholders and analytics teams want. Speed to the business encompasses self-service data acquisition, faster deployment of data services, and faster changes. The reason, they need to act on the data and insights.
The right strategy is to create a vision that orients toward business outcomes. Today's reality is that we live in a world where it is no longer about first to market, we have to be about first to value. First to value with our customers, and first to value with our business capabilities. The speed at which insights are gained and ultimately how they are put to use is your data management strategy.
Last year I published a reasonably well-received research document on Hadoop infrastructure, “Building the Foundations for Customer Insight: Hadoop Infrastructure Architecture”. Now, less than a year later it’s looking obsolete, not so much because it was wrong for traditional (and yes, it does seem funny to use a word like “traditional” to describe a technology that itself is still rapidly evolving and only in mainstream use for a handful of years) Hadoop, but because the universe of analytics technology and tools has been evolving at light-speed.
If your analytics are anchored by Hadoop and its underlying map reduce processing, then the mainstream architecture described in the document, that of clusters of servers each with their own compute and storage, may still be appropriate. On the other hand, if, like many enterprises, you are adding additional analysis tools such as NoSQL databases, SQL on Hadoop (Impala, Stinger, Vertica) and particularly Spark, an in-memory-based analytics technology that is well suited for real-time and streaming data, it may be necessary to begin reassessing the supporting infrastructure in order to build something that can continue to support Hadoop as well as cater to the differing access patterns of other tools sets. This need to rethink the underlying analytics plumbing was brought home by a recent demonstration by HP of a reference architecture for analytics, publicly referred to as the HP Big Data Reference Architecture.
At the China Hadoop Summit 2015 in Beijing this past weekend, I talked with various big data players, including large consumers of big data China Unicom, Baidu.com, JD.com, and Ctrip.com; Hadoop platform solution providers Hortonworks, RedHadoop, BeagleData, and Transwarp; infrastructure software vendors like Sequotia.com; and Agile BI software vendors like Yonghong Tech.
The summit was well-attended — organizers planned for 1,000 attendees and double that number attended — and from the presentations and conversations it’s clear that big data ecosystems are making substantial progress. Here are some of my key takeaways:
Telcos are focusing on optimizing internal operations with big data.Take China Unicom, one of China’s three major telcos, for example. China Unicom has completed a comprehensive business scenario analysis of related data across each segment of internal business operations, including business and operations support systems, Internet data centers, and networks (fixed, mobile, and broadband). It has built a Hadoop-based big data platform to process trillions of mobile access records every day within the mobile network to provide practical guidelines and progress monitoring on the construction of base stations.
To compete in today's global economy, businesses and governments need agility and the ability to adapt quickly to change. And what about internal adoption to roll out enterprise-grade Business Intelligence (BI) applications? BI change is ongoing; often, many things change concurrently. One element that too often takes a back seat is the impact of changes on the organization's people. Prosci, an independent research company focused on organizational change management (OCM), has developed benchmarks that propose five areas in which change management needs to do better. They all involve the people side of change: better engage the sponsor; begin organizational change management early in the change process; get employees engaged in change activities; secure sufficient personnel resources; and better communicate with employees. Because BI is not a single application — and often not even a single platform — we recommend adding a sixth area: visibility into BI usage and performance management of BI itself, aka BI on BI. Forrester recommends keeping these six areas top of mind as your organization prepares for any kind of change.
Some strategic business events, like mergers, are high-risk initiatives involving major changes over two or more years; others, such as restructuring, must be implemented in six months. In the case of BI, some changes might need to happen within a few weeks or even days. All changes will lead to either achieving or failing to achieve a business. There are seven major categories of business and organizational change:
If you think you can do big data in-house, get ready for a lot of disappointment. If the data you want to analyze is in the terabytes in size, comes from multiple sources -- streams in from customers, devices or sensors -- and the insights you need are more complex than basic trending, you are probably looking for a data scientist or two. You probably have an open job requisition for an Hadoop expert as well and have hit the limit on what your capital budget will let you buy to house all this data and insights. Thus you are likely taking a hard look at some cloud-based options to fill your short term needs.
I’ve been talking to a number of users and providers of bare-metal cloud services, and am finding the common threads among the high-profile use cases both interesting individually and starting to connect some dots in terms of common use cases for these service providers who provide the ability to provision and use dedicated physical servers with very similar semantics to the common VM IaaS cloud – servers that can be instantiated at will in the cloud, provisioned with a variety of OS images, be connected to storage and run applications. The differentiation for the customers is in behavior of the resulting images:
Deterministic performance – Your workload is running on a dedicated resource, so there is no question of any “noisy neighbor” problem, or even of sharing resources with otherwise well-behaved neighbors.
Extreme low latency – Like it or not, VMs, even lightweight ones, impose some level of additional latency compared to bare-metal OS images. Where this latency is a factor, bare-metal clouds offer a differentiated alternative.
Raw performance – Under the right conditions, a single bare-metal server can process more work than a collection of VMs, even when their nominal aggregate performance is similar. Benchmarking is always tricky, but several of the bare metal cloud vendors can show some impressive comparative benchmarks to prospective customers.