Strata + Hadoop World CA 2015

Many of us will be at the Strata Conference + Hadoop World 2015 in San Jose, and we’d love to see you there! Visit us at booth P18 in the Innovator’s Pavilion to meet us and check out some of the R&D we are doing, and join us for our tutorial and presentation sessions:

Wednesday, February 18

Building a Data Platform

1:30–5:00pm in Room 210 C/G
John Akred, Stephen O’Sullivan & Manu Mukerji

What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.

Thursday, February 19

How Minority Becomes Majority: A Study of Gerrymandering

1:30-2:10pm in Room LL21 B
Tatsiana Maskalevich

During the last government shutdown, on “The Daily Show with Jon Stewart,” John Oliver noted that congress has a 90% retention rate despite a 10% approval rating. Why? Gerrymandering has become a prime suspect. Is this true, or just truthy? Come find out how a state with a 51% Democrat, 49% Republican electorate enjoys a lopsided congressional delegation of 4 Democrats and 9 Republicans.

Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Engine

1:30-2:10pm in Room 210 C/G
Richard Williamson

This talk will examine the benefits of using multiple persistence strategies to build an end-to-end predictive engine. Utilizing Spark Streaming backed by a Cassandra persistence layer allows rapid lookups and inserts to be made in order to perform real-time model scoring. Spark backed by Parquet files, stored in HDFS, allows for high-throughput model training and tuning utilizing Spark MLlib. Both of these persistence layers also provide ad-hoc queries via Spark SQL in order to easily analyze model sensitivity and accuracy. Storing the data in this way also provides extensibility to leverage existing tools like CQL to perform operational queries on the data stored in Cassandra and Impala to perform larger analytical queries on the data stored in HDFS further maximizing the benefits of the flexible architecture.

Great Debate: Data-driven Optimization is the Enemy of Innovation

4:00-4:40pm in Room LL20 BC
Tatsiana Maskalevich

Ruthless optimization squeezes every ounce of advantage from the current business model. But it takes a leap of faith—not something the numbers tend to encourage—to truly innovate. When we’re informed by data, are we blinded by opportunity? Or does data pave the way for the best innovations, forcing us to take a harder look at bad ideas that will never work out? In this Oxford Debate, join two teams who’ll try to convince you of their take on the argument. Then we’ll take a vote and declare the victor.

Friday, February 20

Ask Us Anything

11:30–12:10pm in Room 211 B

What does successful big data and data science really look like? As consultants out in the field, we’ve learned a lot of lessons and have great stories to tell about success, failure, and how to negotiate a path through a fast-moving technology landscape.

Come and talk to us about:

creating a data strategy
data science teams: hiring and running them
big data platforms and architectures
which tools should I choose?
how to engage your whole organization with data

Robust Event Detection Using Diverse Data Types

1:30–1:50pm in Room LL20 A
Harrison Mebane

This session will discuss how to build a resilient, multi-modal event-detection system based on error-prone sources—video, audio, natural language, and external APIs. We will briefly review event-detection techniques and then demonstrate how to combine these across multiple data streams.

Get the slides!

If you’re not able to attend the conference, or if you miss one of our sessions at Strata + Hadoop World, sign up with your email here, and we’ll send you copies of all our slides after the conference is over. If you sign up before the conference, we’ll also send you our Data Strategy position paper, which is great companion reading if you’re attending any of our talks.