Strata + Hadoop World CA 2017
Many of us will be at Strata in San Jose, and we’d love to see you there!
Tuesday, March 14
What are the essential components of a data platform? John Akred and Stephen O’Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
By tracing the flow of data from source to output, John and Stephen explore the options and considerations for components, including acquisition from internal and external data sources; ingestion (offline and real-time processing); storage; analytics (batch and interactive); and providing data services (exposing data to applications). They’ll also give advice on tool selection, the function of the major Hadoop components and other big data technologies such as Spark and Kafka, and integration with legacy systems.
The Business Case for Deep Learning, Spark, and Friends
Technologies like deep learning are white-hot, but why do they matter? The secret power of today’s data technologies is that they promote economic scaling and flexible development patterns that can adapt to business needs—but industry hype has obscured much of the value to those approaching the topic. Skepticism is an understandable reaction.
Developers are usually the first to understand why some technologies cause more excitement than others. Edd Wilder-James relates this insider knowledge, providing a tour through the hottest emerging data technologies of 2017 to explain why they’re exciting in terms of both new capabilities and the new economies they bring. Edd explores the emerging platforms of choice and explains where they fit into a complete data architecture and what they have to offer in terms of new capabilities, efficiencies, and economies of use.
- Deep learning and AI
- Docker and containers
- Notebooks for data science
Big data and data science have great potential for accelerating business, but how do you reconcile the business opportunity with the sea of possible technical solutions? Fundamentally, data should serve the strategic imperatives of a business—those key strategic aspirations that define the future vision for an organization. A data strategy should guide your organization in two key areas—what actions your business should take to get started with data and where to start to realize the most value.
Edd Wilder-James and Scott Kurth explain how to create a modern data strategy to power data-driven business.
- Why have a data strategy?
- Connecting data with business
- Devising a data strategy
- The data value chain
- New technology potentials
- Project development style
- Organizing to execute your strategy
Thursday, March 16
Graph-based Anomaly Detection: When and How
Thanks to frameworks such as Spark’s GraphX and GraphFrames, graph-based techniques are increasingly applicable to anomaly, outlier, and event detection in time series. However, most data do not naturally come in the form of a network that can be represented in graphs. Therefore, it is not clear whether graph-based techniques always offer the most appropriate approach to detect anomalies.
Jeffrey Yau offers an overview of applying graph-based techniques and outlines the benefits of graphs relative to other techniques. Jeffrey compares and contrasts the use of graph theory and techniques, large-scale time series mining methods, and traditional parametric linear and nonlinear time series techniques in anomaly, outlier, and event detection—with specific examples from credit card fraud, wearable IoT devices, and financial time series.
- Static graphs
- Dynamic graphs
- The most common large-scale time series mining methods
- Traditional parametric linear and nonlinear time series techniques, including change-point detection
- Trade-offs need to be made when applying each of these classes of techniques to identify anomalies