Data Scientist: Which Adjacent Roles Are Central?

Data scientists don’t work in isolation. As with any scientists, they rely on a wide range of people in adjacent roles to help them do their jobs as effectively as possible.

Think about science generally. In the historical development of modern science, the specialization of roles continues to proliferate. But today’s professional science establishment is a relatively recent phenomenon. Back in the Middle Ages — and even well into the modern era — scientists often had to be jacks of all trades in order to carry on their investigations. Until the 19th century, there were few professional scientists, research universities, or commercial labs. There were no eager, underpaid graduate students to press into service. Until the 20th century, most professional scientists had to build and maintain their own laboratories, invent and calibrate their own instruments, painstakingly record their own observations, and concoct and promote their own theories.

Today’s professional scientists — of which data scientists are a key category — have it much easier. Whether they work with particle accelerators or linear regression models, scientists know they don’t need to be their own chief cooks and bottle washers. They can make science their day job and rely on a host of others for all of the necessary supporting tools and infrastructure. We find the following broad division of labor in all of today’s scientific disciplines, including data science:

  • Investigation. To be a true scientist, your core job must be to investigate reality to whatever depth is necessary and from all relevant angles. Data scientists conduct their investigations with statistical algorithms and interactive exploration tools that help them uncover nonobvious patterns in observational data. Actually, you can regard today’s business-oriented data science as a branch of the behavioral sciences, because most such initiatives focus on investigating factors that drive such human behaviors as customer churn, purchasing, and recommending.
  • Instrumentation. A true scientist uses instrumentation suited to the phenomena that they’re observing, modeling, testing, and measuring. Without statistical modeling, predictive analysis, and other tools, data scientists would not have the pattern-finding instrumentation on which they rely. Likewise, the underlying platform components — including data warehousing, visualization, integration, and governance tools — are key pieces of the instrumentation that data scientists need for exploring deep data. Somebody has to provide all of these tools of the data scientist’s trade, hence the exploding ecosystem of “big data” solution providers.
  • Institution. And a true scientist needs to make a steady living focusing on their investigations. The institutions that employ them may be public or private sector, nonprofit or commercial. The institutions that help them communicate and collaborate with other scientists may be professional associations, journals, or other forums. Right now in data science, we see a huge push toward open source models of collaboration. This is most obvious in the area of open source platform/tool-focused communities such as Apache Hadoop and R, but it’s the trend in all collective areas of human investigation.

Increasingly, today’s data scientists realize they must stand on the giant shoulders of social networks and other online forums to pool their collective brainpower.

Comments

For-Profit Science

My focus is on data science within business enterprises. Although a few businesses sustain pure research, most find that to be a rare luxury. Instead, what drives the funding for scientific positions is the income from products and services that use the insights of descriptive & predictive behavioral models. In other words, when theory moves into practice, the business can monetize the insights.

It's a delicate balance to publish scientific results and collaborate with peers, and yet to allow the employer to gain enough competitive advantage from a new model to justify the financial expenditure for the science that led to that model. Patents notwithstanding, this often comes down to how well the data scientist works with the broader business teams that socialize and deploy the model into a repeatable and sustainable set of business tools.

Clearly, "data science" in business mostly applied

Kevin:

Theoretical breakthroughs are nowhere in the business-sponsored data scientist's performance goals. For that reason, it's highly doubtful that anybody will ever win, say, a Nobel Prize for some great discovery they make doing data science for a business. To the extent that, like Penzias & Wilson (http://en.wikipedia.org/wiki/Cosmic_microwave_background_radiation) they stumble on some profound new scientific discovery in their work at a corporate lab, that's cool, but doesn't pay the bills.

Jim