Data scientists are a curious breed. The term encompasses a wide range of specialties, all of which rely on statistical algorithms and interactive exploration tools to uncover nonobvious patterns in observational data.
Who belongs in this category? Clearly, the “quants” are fundamental. Anybody who builds multivariate statistical models, regardless of the tool they use, might call themselves a data scientist. Likewise, data mining specialists who look for hidden patterns in historical data sets — structured, unstructured, or some blend of diverse data types — may certainly use the term. Furthermore, a predictive modeler or any analyst who builds fact-based what-if simulations is a data scientist par excellence. We should also include anybody who specializes in constraint-based optimization, natural language processing, behavioral analytics, operations research, semantic analysis, sentiment analysis, and social network analysis.
But these jobs are only one-half of the data-science equation. The “suits” are also fundamental. Any business domain specialist who works with any of the tools and approaches listed above may consider him- or herself a data scientist. In fact, if one and the same person is a black belt in SAS, SPSS, R, or other statistical tools, and also an expert in marketing, customer service, finance, supply chain, or other business specialties, they are a data scientist par excellence.
Both of these skill sets are fundamental to high-quality data science. Lacking statistical expertise, you can’t understand which are the most appropriate algorithms and approaches to make the foundation of your statistical models. Lacking business domain expertise, you can’t identify the most valid variables and appropriate data sets to build into your models around.
Read more