
Building a Prediction Engine using Spark, Kudu, and Impala
In this post, Richard walks you through a demo based on the Meetup.com streaming API to illustrate how to predict demand in order to adjust resource allocation.
Richard has been at the cutting edge of big data since its inception, leading multiple efforts to build multi-petabyte Hadoop platforms, maximizing business value by combining data science with big data. He has extensive experience creating advanced analytic systems using data warehousing and data mining technologies.
Richard is an expert in big data architecture, platform deployment, and large-scale data science. Prior to joining SVDS, he led development of a multi-petabyte Hadoop platform at WalmartLabs. The platform included deployment of two separate Impala data warehouses that hosted production and ad-hoc workloads serving hundreds of billions of rows of log and transactional data. The warehouses also included HBase instances setup with active-active replication in two separate data centers serving millions of operations per second on near real-time data feeds from flume. Prior to WalmartLabs, Richard launched the first Hadoop system at Walmart Stores, spanning from idea to multi-petabyte production system starting in 2009. This included proposal to build Hadoop as a complement to the data warehouse and data mining platform then rapidly moving from proof of concept to full secure production deployment enabling customers to perform analysis that could not be done in existing systems.
He has also built several advanced analytics applications including: distributed optimization engine for workforce scheduling; forecasting systems using over ten years of history to predict daily, weekly and monthly sales; transportation route scheduling systems; supply chain optimization systems; price modeling systems; and various data mining efforts.
Richard holds a Bachelor of Science in Mathematics and Computer Science from Missouri Southern State University.
In this post, Richard walks you through a demo based on the Meetup.com streaming API to illustrate how to predict demand in order to adjust resource allocation.
An important aspect of a modern data architecture is the ability to use multiple execution frameworks over the same data.
Principal Engineers Richard Williamson and Andrew Ray will be on Pepperdata’s webinar panel of industry experts, talking about Spark trends and use cases. Sign up here to attend the webinar, or get the recording.
SVDS Principal Engineer Richard Williamson presents a session on “Leveraging Multiple Persistence Layers in Spark to Build a Scalable Prediction Engine.”
SVDS presents two sessions at StampedeCon: one that examines the benefits of using multiple persistence strategies to build an end-to-end predictive engine; and a look at how to choose an HDFS data storage format: Avro vs. Parquet and more.
SVDS presents two sessions at Hadoop Summit: one that maps the central concepts in Spark to those in the SAS language, including datasets, queries, and machine learning; and a look at how to choose an HDFS data storage format: Avro vs. Parquet and more.
Several of us will be presenting and we’d love to see you there. Join us for our tutorials and sessions, or come visit us at our booth in the Expo Hall.