TDWI Accelerate Boston 2017
Several of us will be in Boston for TDWI’s Accelerate conference, and VP of Technology Strategy Edd Wilder-James will be keynoting. We look forward to meeting up, so please let us know if you’ll be there. If you can’t attend, use the form on this page to sign up for our slides.
Monday, April 3
With the recent advancements in machine learning algorithms and statistical techniques, and the increasing ease of implementing them, it is tempting to ignore the power and necessity of exploratory data analysis (EDA), the crucial step before diving into machine learning or statistical modeling. Simply applying machine learning algorithms without a proper orientation of the dataset can lead to wasted time and spurious conclusions.
EDA allows practitioners to gain intuition for the pattern of the data, identify anomalies, narrow down a set of alternative modeling approaches, devise strategies to handle missing data, and ensure correct interpretation of the results. Further, EDA can rapidly generate insights and answer many questions without requiring complex modeling.
In this talk, Chloe proves just how valuable exploratory data analysis can be before any modeling even takes place, both in terms of the insight that it can bring, as well as the improvements it can make in the modeling process. Through examples, she will outline best practices for EDA of cross-section, time series, and panel data. She will also differentiate the EDA needed for categorical and numerical data types. This talk is for data practitioners who are interested in enriching their EDA abilities.
When data scientists are done building their analytical models, there are questions to ask:
- How do the model results get to the hands of the decision makers or applications that benefit from this analysis?
- Can the model run automatically without issues and how does it recover from failure?
- What happens if the model becomes stale because it was trained on data that is no longer relevant?
- How do you deploy and manage new versions of that model without breaking downstream consumers?
This talk aims to illustrate the importance of these questions and provide a perspective on how to address them. Mauricio will share our experiences deploying models at our clients and some of the problems we have encountered along the way, along with some best practices and coding examples.
Tuesday, April 4
Visual Storytelling with Data: Making an Impact with your Visualizations
What is the difference between a boring graph and an eye-catching visualization? It’s not about tools, colors, or shapes, but about applying design best practices. This talk is for those with the software chops and interest necessary to create data visualizations, who would like to make their work more compelling. We’ll look at storytelling as a framework for the work of visualizing data, and then discuss the key concepts that will move you toward effective visual communication. Whether you’re creating data visualizations for the entire public, or just for your boss, you’ll learn how to tailor your message effectively and then make specific design choices to support and reinforce that message.
Wednesday, April 5
What’s the best way to pursue data-driven projects? Drawing from our experience with cross-functional teams of engineering, quantitative, and visualization skills, John highlights the benefits of collaborative teams of experts working iteratively, across disciplines, and explains how to manage these teams to successfully and efficiently deliver data analytics projects.
Project Jupyter has a wide range of tools (including Jupyter Lab, Jupyter Notebooks and JupyterHub among others) that make interactive data analysis a wonderful experience. However, the capabilities that give power to individual data scientists can prove to be challenging to integrate in a team setting. Challenges stem from peer review of code, to quality assurance on the analysis itself, to sharing the results with management or a client expecting a formal document.
This talk will consist of three parts:
- An overview of the technology stack, with examples of why Jupyter has become such a popular tool in the professional data science toolkit.
- The principles behind successful collaboration between data scientists and their managers in a team setting. Examples of several different workflows for different options will be given.
- An exploration into how Jupyter can be tied into a larger data science ecosystem through its native support for different popular data science languages(Python, R, Julia), Jupyter tools like nbdime that allow tighter integration with git (version control), and the implementation of tools like Apache Spark.
The intended audience for this talk is practicing data scientists and people who work closely with them (data science managers, for example).