Tom Fawcett

Tom has over 20 years experience applying machine learning and data science across five different companies. He is co-author of the highly regarded and top-selling book Data Science for Business (O’Reilly, 2013), which is now used in over 140 universities around the world.

Prior to joining SVDS, as a senior architect at Proofpoint, Tom applied machine learning techniques, including social network analysis and probabilistic inference, to email analysis and filtering. While at Stanford’s Center for the Study of Language and Information, he led a DARPA-sponsored project on Transfer Learning. He has also held senior research scientist positions at HP Labs, NYNEX, and GTE Labs.

Tom holds a Ph.D. in Computer Science (Machine Learning) from the University of Massachusetts, Amherst. He is an action editor of Machine Learning Journal; he also serves on the editorial boards of the journals Data Mining and Knowledge Discovery and Big Data, as well as on the advisory board of the Berkeley Extension Data Science Program.

Recent Posts

Learning from Imbalanced Classes

For this month’s Throwback Thursday, a post that provides insight and concrete advice on how to tackle imbalanced data.

connecting data science and business puzzle pieces

Merging Data Science and Business

Business leaders cannot afford to ignore their organization’s data—rather, that data should be used to make informed decisions. In this post, Principal Data Scientist Tom Fawcett and Professor of Data […]

Evaluating Data Science Projects

Evaluating Data Science Projects: A Case Study Critique

You should understand whether the right things have been measured and whether the results are suitable for the business problem.

ML vs Stats

Machine Learning vs. Statistics

We (Tom, a Machine Learning practitioner, and Drew, a professional Statistician) have worked together for several years. We believe we have an understanding of the role of each field within data science, which we attempt to articulate here.

Driving Product Engagement with User Behavior Analytics

In this post, we will look at driving product engagement with behavioral data, as well as building an integrated analytical environment.

Data-Driven User Engagement

The promise of data and analytics for product companies is that they can help you understand usage, and improve your ability to build, deploy, and service products to customers much more accurately and efficiently. In this post, we look at understanding the customer life cycle.

Analyzing Caltrain Delays

In this post, we will explore some aspects of the train delay data we’ve been collecting from the Caltrain API.

Avoiding Common Mistakes with Time Series Analysis

A basic mantra in statistics and data science is correlation is not causation, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is a lesson worth learning.

Imbalanced Classes FAQ

Here we share some further thoughts on imbalanced classes, and offer more resources.

Learning from Imbalanced Classes

This post gives insight and concrete advice on how to tackle imbalanced data.

Analyzing Caltrain Delays: What We Can Learn

In this post, we will explore some aspects of the train delay data we’ve been collecting from the Caltrain API over the past few months. The goal is to get our heads into the data before setting off on building a prediction model.

The Basics of Classifier Evaluation: Part 2

A previous blog post made the point that classifiers shouldn’t use classification accuracy as a performance metric. The next part in this series was going to discuss other evaluation techniques such as ROC curves, profit curves, and lift curves. However, there are several important points to be made first. Here I present a sequence that shows the progression and inter-relation of the issues.

The Basics of Classifier Evaluation: Part 1

If it’s easy, it’s probably wrong.

Avoiding Common Mistakes with Time Series

A basic mantra in statistics and data science is correlation is not causation. This is a lesson worth learning.

Listening to Caltrain: Analyzing Train Whistles with Data Science

As an R&D project, we have been playing with data science techniques to understand and predict delays in the Caltrain system.