Merging Data Science and Business

An interview with Tom Fawcett and Foster Provost  |  November 2nd, 2017

Business leaders cannot afford to ignore their organization’s data—rather, that data should be used to make informed decisions. In this post, Principal Data Scientist Tom Fawcett and Professor of Data Science Foster Provost discuss how businesses can make the most of their analytical teams. Tom and Foster are the authors of Data Science for Business.

What aspect of data science do you feel business folks most miss/misunderstand?

Tom: I’ll list four:

  1. Data science techniques need data. (You’d be surprised how often this is forgotten.)
  2. Evaluation should be one of the first things you think about, rather than the last. If you don’t know how to evaluate a solution (i.e., how to tell whether one solution is better than another), you’re not quite ready yet.
  3. Like visualization, clustering is good for developing ideas about data but not necessarily for solving problems with it.
  4. Deep learning is getting a lot of attention in data science right now. But it’s very expensive and unless you’re tackling perceptual domains (audio, video) or natural language, it probably won’t give you an edge.

Foster: There are multiple, different analytic processes that are served by data science. Business people ought to at least understand the different ways that data science is applied, in order to be able to reason about how data science might be used to address some business challenge. As examples, consider three very different data analytic processes: (a) drawing inferences about some case that is described by data, (b) coming up with the “model” that is used to draw those inferences, and (c) assessing whether one model is better than another.

To be more concrete, in a fraud detection scenario we might have: (a) deciding whether a particular account has been defrauded, based on observed account behavior, (b) building a data-driven model that can make those decisions, and (c) assessing the effectiveness of such models. We know a lot about how to do all of these things, and they are intricately intertwined. But they are very different analytic processes. Often using the model (a) and building the model (b) are confounded in business folk’s heads, which oftens leads to confusion. Often evaluation (c) is considered an afterthought, when really it should be one of the critical factors in the initial “business understanding” phase that should precede even formulating the data (let alone doing the modeling).

Both business folk and data scientists (with notable exceptions) generally fall prey to thinking that choosing the right machine learning method is the most important factor–when by and large it is relatively unimportant as compared to formulating the problem well and acquiring/engineering the right data.

What are your tips for leaders when it comes to hiring data science talent?

Tom: View hiring as a long(er)-term investment. Managers often hire a data scientist for a specific task such as ad placement, recommendation, or text processing. So they advertise for that expertise and use interviews to weed out people who don’t have it. It’s good to have at least one person with the specific expertise, but for others I’d recommend looking for smart data scientists with track records showing they can learn new domains. Assume a smart person can become productive in about a month, and proficient in six months.

Foster: In my experience the most important factor is having other data science talent. On the one side, non-data-scientist managers have a difficult time knowing whether a particular data science candidate is going to be good for a particular job. On the other side, individual data scientists often do not thrive as the sole data scientist in an organization (for a variety of reasons that we discuss in the book). This can lead to data scientists not accepting what seem to be good offers, and to poor retention if they do accept. The result is that firms end up having a difficult time “getting started” with data science, and the data-science-rich end up getting richer. As we discuss in the book, there are strategies that the less rich can use to address these challenges–for example, address the “cold start” problem by investing in creating a “temporary” critical mass of data science talent.

And to build on Tom’s point: most people are terrible at interviewing data scientists.

How much data science (in a general sense) do business leaders really need to know?

Tom: It’s hard to talk about “how much” without a frame of reference. We wrote our book for that audience so I’ll answer in terms of the book. It depends on the leader. People who manage data scientists should know the first three chapters. If they want to keep going, they should read the last two chapters. Executives and investors who are regularly expected to evaluate data science projects and proposals should probably read all of it, especially the “project/proposal evaluation” portions.

Foster: This really depends on the sorts of decisions that the business leader needs to make about data science. Rarely do business leaders need to know the technical details of the algorithms, but if the leader is going to be making decisions about investing in data science solutions or talent or platforms, they cannot make truly informed decisions without understanding the fundamental processes involved, how to evaluate appropriately, how to invest in appropriate data assets, and generally what are the different sorts of things that data science can do. Regarding the latter, I don’t mean things it can do like “fraud detection” and “targeting offers” — because credit card fraud detection and medicare fraud detection are completely different from the point of view of data science solutions, as are targeting an offer you’ve made before and targeting a brand-new offer. I mean things it can do like estimate unknown values, estimate the causal influence of some intervention, support discovery of “unknown unknowns”, etc.

When business leaders understand the processes, they end up treating data science projects more like R&D projects than like traditional IT development projects. I don’t mean to scare managers off, who have had bad experiences with poorly managed R&D. What I mean is: data science projects are inherently risky. So we should look for ones that have high potential payoff. We should take a portfolio approach. We should make smaller investments in reducing uncertainty early, and reassess whether to continue along the current path, or to abort, or to pivot. In order for managers to be involved in making crucial decisions about investing in data science, they need to understand the fundamentals–to ask the right questions and understand the answers.

And to add to Tom’s points: business leaders also need to understand at least the basics of evaluation of data science solutions. The statistics involved are not super sophisticated, but there are some nuances that can blindside a business leader if she hasn’t thought about them before.

You’re updating your book right now. What are you updating? How has data science changed in the years since its first publication?

Tom: We deliberately organized the book around general principles of data science, relying as little as possible on specific current technology, so little needs to be changed. That said, new topics have arisen which we feel we should explain so the reader doesn’t feel left behind. Deep Learning is nearly ubiquitous now so we have to explain how it works and how it fits into the field—even to people who won’t be using it. Assessing causality behind phenomena is becoming increasingly important, as is model accountability: explaining why a model made a certain decision.

Foster: There are really two answers to that question. First, is the material that is in the prior revision no longer correct/useful/relevant? And second, is there new material that ought to be added? We’ve been happy to find that the fundamentals we included originally are still as relevant, if not more so (as more organizations apply data science). Regarding new material, my view is that there are a handful of additional topics that either were very relevant, but we chose not to include, or have become very relevant—Tom noted deep learning and causality as two examples.

Any thoughts on how business thinking has changed between when you wrote the book, and now?

Foster: Business thinking now is focusing much attention on Artificial Intelligence. AI is a very broad field, but when you dig in to what people are talking about now, much of the interest is focused on three areas of AI: (i) drawing business-relevant inferences automatically from data, (ii) machine learning, and (iii) natural language processing (NLP). Data science fundamentally underlies all of these areas, and for many cases we’re just experiencing a shift in vocabulary. One major addition to business thinking over the past five years is the intense interest in advances in what we might call “perceptual” solutions, most notably undertanding speech (“Alexa, get me some AI”) and computer vision (think: self-driving cars).

Want to stay in touch? Sign up for our newsletter