Data science has historically had to content itself with mere samples. Few data scientists have had the luxury of being able amass petabytes of data on every relevant variable of every entity in the population under study.
The big data revolution is making that constraint a thing of the past. Think of this new paradigm as “whole-population analytics,” rather than simply the ability to pivot, drill, and crunch into larger data sets. Over time, as the world evolves toward massively parallel approaches such as Hadoop, we will be able to do true 360-degree analysis. For example, as more of the world’s population takes to social networking and conducts more of its lives in public online forums, we will all have comprehensive, current, and detailed market intelligence on every demographic available as if it were a public resource. As the price of storage, processing, and bandwidth continue their inexorable decline, data scientists will be able to keep the entire population of all relevant polystructured information under their algorithmic microscopes, rather than have to rely on minimal samples, subsets, or other slivers.
Clearly, the big data revolution is fostering a powerful new type of data science. Having more comprehensive data sets at our disposal will enable more fine-grained long-tail analysis, microsegmentation, next best action, customer experience optimization, and digital marketing applications. It is speeding answers to any business question that requires detailed, interactive, multidimensional statistical analysis; aggregation, correlation, and analysis of historical and current data; modeling and simulation, what-if analysis, and forecasting of alternative future states; and semantic exploration of unstructured data, streaming information, and multimedia.
Read more