Analytics are the steering wheel that humanity uses to drive the world — or at least that portion of the planet over which we have some influence. Without the sensors, the correlators, the aggregators, the visualizers, the solvers, and the rest of what analytic applications depend on, we would be only a passenger, not a copilot, on this, our only home.
If you’ve spent any time around advanced mathematics and analytics, you’re bound to run into the phrase “global optimization.” All in all, this has little to do with optimizing the globe we live on; instead, it refers to techniques for solving a set of equations under various constraints. Nevertheless, I love the phrase’s evocative ring, in that it suggests the Gaia Hypothesis, a controversial conjecture that the Earth is a sort of super-organism. Specifically, it models the Earth as a closed, self-regulating, virtuous feedback loop of organic and inorganic processes that, considered holistically, maintains life-sustaining homeostasis. This hypothesis suggests that the planet as a whole is continuously optimizing the conditions for our ongoing existence — and that the biosphere may perish, just like any organism, if it falls into a vicious feedback loop of its own undoing.
Big data is an ecosystem in which the open source approaches have the greatest momentum: the most widespread adoption and the most feverish innovation. Open source platforms are expanding their footprint in advanced analytics.
As the enterprise Hadoop market continues to mature and many companies deploy their clusters for the most demanding analytical challenges, data scientists will begin to migrate toward this new, open source-centric platform. At the same time, enterprise adoption of the open source R language will grow in 2012 and beyond, and we’ll see greater industry convergence between Hadoop and R, especially as analytics tool vendors integrate both technologies tightly into their offerings. We will also see increasing adoption of open source data integration tools, such as those commercialized by Talend and others, and of open source BI tools from Pentaho, Jaspersoft, and others.
This is happening for the following reasons:
Open source initiatives are transforming all platforms and tools. Open source infrastructure, platforms, tools, and applications — such as Linux, Apache, Eclipse, Python, Mozilla, and Android — have gained widespread adoption in many sectors of the IT world, due to advantages such as no-cost licensing, extensibility, and vibrant communities.
Open source communities are where the fresh action is. Open source communities have fostered innovative new approaches and ecosystems, increasingly getting a jump on the incumbent providers of proprietary, closed source — albeit feature-rich and robust — offerings in advanced analytics, data warehousing, and integration tools.
One of the predictions I made at the start of this year was that real-world experiments will become the new development paradigm for next best action in multichannel customer relationship management (CRM). If we consider that multichannel CRM applications are driving big data initiatives, it’s clear that real-world experiments are infusing data management and advanced analytics development best practices more broadly. Increasingly, my big data customer engagements are focusing on CRM next best action, with a keen customer interest in life-cycle management of the analytic applications needed for real-world experiments in marketing campaign and customer experience optimization.
This year and beyond, we will see enterprises place greater emphasis on real-world experiments as a fundamental best practice to be cultivated and enforced within their data science centers of excellence. In a next best action program, real-world experiments involve iterative changes to the analytics, rules, orchestrations, and other process and decision logic embedded in operational applications. You should monitor the performance of these iterations to gauge which collections of business logic deliver the intended outcomes, such as improved customer retention or reduced fulfillment time on high-priority orders.
The key use case of next best action infrastructure — aka decision automation — is to allow companies to rapidly engage in real-world experiments in production applications and, if they’re bold, in their operational business model as a whole. In a CRM context, you can implement different predictive propensity models in different channels, at different interaction points, using different call-center scripts and message contents, with different customer segments, and with other variables.
Is big data just more marketecture? Or does the term refer to a set of approaches that are converging toward a common architecture that might evolve into a well-defined data analytics market segment?
That’s a huge question, and I won’t waste your time waving my hands with grandiose speculation. Let me get a bit more specific: When, if ever, will data scientists and others be able to lay their hands on truly integrated tools that speed development of the full range of big data applications on the full range of big data platforms?
Perhaps that question is also a bit overbroad. Here’s even greater specificity: When will one-stop-shop data analytic tool vendors emerge to field integrated development environments (IDEs) for all or most of the following advanced analytics capabilities at the heart of Big Data?
Of course, that’s not enough. No big data application would be complete without the panoply of data architecture, data integration, data governance, master data management, metadata management, business rules management, business process management, online analytical processing, dashboarding, advanced visualization, and other key infrastructure components. Development and deployment of all of these must also be supported within the nirvana-grade big data IDE I’m envisioning.
And I’d be remiss if I didn’t mention that the über-IDE should work with whatever big data platform — enterprise data warehouse, Hadoop, NoSQL, etc. — that you may have now or are likely to adopt. And it should support collaboration, model governance, and automation features that facilitate the work of teams of data scientists, not just individual big data developers.
Social media are the intelligence powering modern marketing. Not only is the Twittersphere dominated by marketers keen on the promotional power of social channels, but it seems everybody in the marketing profession everywhere is obsessed with this new world of ubiquitous chitchat.
Everybody comments on social media analytics, so what I’m saying here isn’t news to most of you. But I recently stopped to ponder what’s truly disruptive about social media’s role in the modern economy. And then it hit me. From the dawn of marketing, we’ve always hunted and gathered customer intelligence, using massive amounts of sweat equity to bag the beast. Before social media emerged, market research was almost always labor-intensive. No matter who you were — enterprise, agency, consultant, analyst, etc. — you had to put your nose to the proverbial research grindstone. You conducted panels, surveys, focus groups, interviews, field studies, usability testing, case studies, literature searches, and the like. Most of the intelligence-gathering burden was on you, with the subject of your studies — the customer — either putting in less effort or not having to lift a finger at all.
Next best action is the proving ground for advanced analytics and big data; it’s also the infrastructure that provides analytics- and rule-driven guidance across one or more customer-facing touchpoints. You can find next best action at the heart of multichannel customer relationship management (CRM) initiatives everywhere. It’s even present in a growing range of back-office business processes such as order fulfillment and supply chain management.
Next best action will continue to develop as an overarching business technology initiative for many companies in the coming year. The market is emerging and is becoming aware of itself as a substantial new niche, in much the same way that the Hadoop market flowered in the past year.
Here are some of the highlights that Forrester anticipates in the next best action arena in 2012:
The next-best-action market will continue to coalesce around core solution capabilities. Traditionally, next best action has been a capability embedded in your customer service, marketing, and other CRM applications. That remains the heart of the next-best-action solution market. However, the past several years have seen the development of a niche for next-best-action standalone infrastructure that you may deploy in conjunction with various CRM and back-office applications. In 2012, we will see more vendors converge on the next-best-action arena from various backgrounds, including predictive analytics, business process management (BPM), business rules management (BRM), complex event processing (CEP), decision automation, recommendation engine, and social graph analysis. Many established vendors will repackage and reposition their offerings in these segments under the banner of next best action in order to address hot new solution areas, including multichannel offer targeting, marketing campaign automation, and customer experience optimization.
Advanced analytics was the hot new frontier of business intelligence (BI) in 2011. The growing vogue for “data science” stemmed in part from many users’ desire to take their BI investments to the next level of sophistication, leveraging multivariate statistical analysis, data mining, and predictive modeling into powerful new applications for customer relationship management (CRM) and back-office business processes.
Business investments in advanced analytics will continue to deepen in the coming year. Here are some of the highlights that Forrester anticipates in this vibrant field in 2012:
Open-source platforms will expand their footprint in advanced analytics. As the enterprise Hadoop market continues to mature and many companies deploy their clusters for the most demanding analytical challenges, data scientists will begin to migrate toward this new, open source-centric platform. At the same time, enterprise adoption of the open-source R language will grow in 2012, and we’ll see greater industry convergence between Hadoop and R, especially as analytics tool vendors integrate both technologies tightly into their offerings.
Big data was inescapable in 2011. Without a doubt, it was the paramount banner story in data management, advanced analytics, and business intelligence (BI). The hype has been relentless, but it’s been driven by substantial innovations on many fronts.
The big data mania will intensify even further in the coming year. Here are some of the highlights that Forrester foresees in this exciting space in 2012:
Enterprise Hadoop deployments will expand at a rapid clip. Many enterprises have spent the past year or two kicking the tires of Hadoop, the emerging open source approach for scaling data analytics into the stratosphere of petabyte volumes, real-time velocities, and polystructured varieties. The market for enterprise-grade Hadoop solutions has grown by leaps and bounds and now includes several dozen vendors. Users all over the world and in most industries have invested aggressively in the technology and stand poised to bring their Hadoop clusters on line in the coming year. The size of the in-deployment clusters will almost certainly grow at least tenfold in 2012 as companies roll new data sources, new analytic challenges, and new business applications into their Hadoop initiatives.
Every true scientist must also be a type of data scientist, although not all self-proclaimed data scientists are in fact true scientists.
True science is nothing without observational data. Without a fine-grained ability to sift, sort, structure, categorize, analyze, and present data, the scientist can’t bring coherence to their inquiry into the factual substrate of reality. Just as critical, a scientist who hasn’t drilled down into the heart of their data can’t effectively present or defend their findings.
Fundamentally, science is a collaborative activity of building and testing interpretive frameworks through controlled observation. At the heart of any science are the “controls” that help you isolate the key explanatory factors from those with little or no impact on the dependent variables of greatest interest. All branches of science rely on logical controls, such as adhering to the core scientific methods of hypothesis, measurement, and verification, as vetted through community controls such as peer review, refereed journals, and the like. Some branches of science, such as chemistry, rely largely on experimental controls. Some, such as astronomy, rely on the controls embedded in powerful instrumentation like space telescopes. Still others, such as the social sciences, may use experimental methods but rely principally on field observation and on statistical methods for finding correlations in complex behavioral data.
Data science has historically had to content itself with mere samples. Few data scientists have had the luxury of being able amass petabytes of data on every relevant variable of every entity in the population under study.
The big data revolution is making that constraint a thing of the past. Think of this new paradigm as “whole-population analytics,” rather than simply the ability to pivot, drill, and crunch into larger data sets. Over time, as the world evolves toward massively parallel approaches such as Hadoop, we will be able to do true 360-degree analysis. For example, as more of the world’s population takes to social networking and conducts more of its lives in public online forums, we will all have comprehensive, current, and detailed market intelligence on every demographic available as if it were a public resource. As the price of storage, processing, and bandwidth continue their inexorable decline, data scientists will be able to keep the entire population of all relevant polystructured information under their algorithmic microscopes, rather than have to rely on minimal samples, subsets, or other slivers.
Clearly, the big data revolution is fostering a powerful new type of data science. Having more comprehensive data sets at our disposal will enable more fine-grained long-tail analysis, microsegmentation, next best action, customer experience optimization, and digital marketing applications. It is speeding answers to any business question that requires detailed, interactive, multidimensional statistical analysis; aggregation, correlation, and analysis of historical and current data; modeling and simulation, what-if analysis, and forecasting of alternative future states; and semantic exploration of unstructured data, streaming information, and multimedia.