Data Scientist: What Skills Does It Require?

Data scientists are a curious breed. The term encompasses a wide range of specialties, all of which rely on statistical algorithms and interactive exploration tools to uncover nonobvious patterns in observational data.

Who belongs in this category? Clearly, the “quants” are fundamental. Anybody who builds multivariate statistical models, regardless of the tool they use, might call themselves a data scientist. Likewise, data mining specialists who look for hidden patterns in historical data sets — structured, unstructured, or some blend of diverse data types — may certainly use the term. Furthermore, a predictive modeler or any analyst who builds fact-based what-if simulations is a data scientist par excellence. We should also include anybody who specializes in constraint-based optimization, natural language processing, behavioral analytics, operations research, semantic analysis, sentiment analysis, and social network analysis.

But these jobs are only one-half of the data-science equation. The “suits” are also fundamental. Any business domain specialist who works with any of the tools and approaches listed above may consider him- or herself a data scientist. In fact, if one and the same person is a black belt in SAS, SPSS, R, or other statistical tools, and also an expert in marketing, customer service, finance, supply chain, or other business specialties, they are a data scientist par excellence.

Both of these skill sets are fundamental to high-quality data science. Lacking statistical expertise, you can’t understand which are the most appropriate algorithms and approaches to make the foundation of your statistical models. Lacking business domain expertise, you can’t identify the most valid variables and appropriate data sets to build into your models around.

In establishing a data science center of excellence in your organization, you must institute forums, processes, training, tools, and other initiatives that bring people with these diverse skills together to collaborate on common projects. You must also encourage people from each camp to cross-train in the other’s area. Business analysts must learn more sophisticated statistical techniques than their schooling instilled in them and more sophisticated tools than their spreadsheets. Statistical analysts must attach themselves to business groups or functions and learn how to apply their quantitative smarts to real operational problems.

Is the garden-variety spreadsheet jockey a data scientist? Yes, to the extent that they build statistical models and use the tool to find nonobvious patterns in structured data, they are engaging in a form of data science. But if this exploration is not their primary job function, they are merely dabbling, not specializing.

Is BI report-building or OLAP cube-development data science? No. Those endeavors, although important, revolve around obvious data patterns — obvious in the sense that an organization has chosen to embed them in repeatable views and access patterns.

Data science is all about asking questions. You engage in it whenever you interactively and iteratively search for deep, hidden patterns.

Comments

Very nice article. I think a

Very nice article. I think a data scientist develops business application skills with more and more real life problem solving. Its very important to have a significant business context along with analytical skills to be able to convert a business problem to an analytics problem and later an analytics solution to a business solution.
I would also stress on visualization and change management skills in a data scientist. Its very important to simplify analytics findings, tell a business value story to the key stakeholders and build the buy-in for implementation of the solution in the business. This gives the data scientist end to end ability to understand the business need, frame the analytics problem, figure out the right data that can be used, solve the problem to generate insights and ability to implement the solution to realize top-line and bottom-line impact.

Merging science into business and business into scientists

Clearly, the new emphasis on the role of data scientists will facilitate the effort to bring science into business. Scientific rigor applied to data and to analytics yields more valuable business decisions.

But it also goes the other way. As you point out, bringing a scientist into a business requires some effort to help them learn the business.

I remember a fun anecdote of the arrival of the professor who became our most skilled data scientist. He left the university and became an employee of the business and upon arriving, asked what algorithm we used to assign clients to salesreps, expecting a discussion about optimization tools. He was bemused to realize that "give the most important clients to the best salesrep" was the answer. From that point on, he made a point of truly learning our business and the resulting scientific contributions were outstanding.

The prof learned the lesson of "gut feel" optimization

Kevin:

"Optimization" is a meatball word to mean whatever decision procedure makes sense, under constraints, to the decision maker. Gut feel of an undefined nature--eg, "give the most important clients to the best salesrep"--is an example of that. The scientist could (and probably did) try to refine the gut feel by building a quantitative model of that decision. But it would be a fruitless exercise if the sales administrator who did the deciding had no use for that quantitative model.

Jim