Tom Fawcett

Tom has over 20 years experience applying machine learning and data science across five different companies. He is co-author of the highly regarded and top-selling book Data Science for Business (O’Reilly, 2013), which is now used in over 140 universities around the world.

Prior to joining SVDS, as a senior architect at Proofpoint, Tom applied machine learning techniques, including social network analysis and probabilistic inference, to email analysis and filtering. While at Stanford’s Center for the Study of Language and Information, he led a DARPA-sponsored project on Transfer Learning. He has also held senior research scientist positions at HP Labs, NYNEX, and GTE Labs.

Tom holds a Ph.D. in Computer Science (Machine Learning) from the University of Massachusetts, Amherst. He is an action editor of Machine Learning Journal; he also serves on the editorial boards of the journals Data Mining and Knowledge Discovery and Big Data, as well as on the advisory board of the Berkeley Extension Data Science Program.

Learning from Imbalanced Classes

For this month’s Throwback Thursday, a post that provides insight and concrete advice on how to tackle imbalanced data.

November 16, 2017

connecting data science and business puzzle pieces

Merging Data Science and Business

Business leaders cannot afford to ignore their organization’s data—rather, that data should be used to make informed decisions. In this post, Principal Data Scientist Tom Fawcett and Professor of Data […]

November 2, 2017

Evaluating Data Science Projects: A Case Study Critique

You should understand whether the right things have been measured and whether the results are suitable for the business problem.

August 24, 2017

Machine Learning vs. Statistics

We (Tom, a Machine Learning practitioner, and Drew, a professional Statistician) have worked together for several years. We believe we have an understanding of the role of each field within data science, which we attempt to articulate here.

August 10, 2017

Driving Product Engagement with User Behavior Analytics

In this post, we will look at driving product engagement with behavioral data, as well as building an integrated analytical environment.

March 7, 2017

Data-Driven User Engagement

The promise of data and analytics for product companies is that they can help you understand usage, and improve your ability to build, deploy, and service products to customers much more accurately and efficiently. In this post, we look at understanding the customer life cycle.

March 2, 2017

Analyzing Caltrain Delays

In this post, we will explore some aspects of the train delay data we’ve been collecting from the Caltrain API.

February 16, 2017

Avoiding Common Mistakes with Time Series Analysis

A basic mantra in statistics and data science is correlation is not causation, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is a lesson worth learning.

January 19, 2017

Imbalanced Classes FAQ

Here we share some further thoughts on imbalanced classes, and offer more resources.

January 5, 2017

Learning from Imbalanced Classes

This post gives insight and concrete advice on how to tackle imbalanced data.

August 25, 2016

Analyzing Caltrain Delays: What We Can Learn

In this post, we will explore some aspects of the train delay data we’ve been collecting from the Caltrain API over the past few months. The goal is to get our heads into the data before setting off on building a prediction model.

March 10, 2016

The Basics of Classifier Evaluation: Part 2

A previous blog post made the point that classifiers shouldn’t use classification accuracy as a performance metric. The next part in this series was going to discuss other evaluation techniques such as ROC curves, profit curves, and lift curves. However, there are several important points to be made first. Here I present a sequence that shows the progression and inter-relation of the issues.

December 10, 2015