Spark Archives - Silicon Valley Data Science

Data Ingestion with Spark and Kafka

In this tutorial, we will walk you through some of the basics of using Kafka and Spark to ingest data.

GARY DUSBABEK

August 15, 2017

kafka spark pipelines monitoring alerting

Managing Spark and Kafka Pipelines

In this post, we will cover some of the basics of monitoring and alerting as it relates to data pipelines in general, and Kafka and Spark in particular.

LARRY MURDOCK

May 11, 2017

From Data Managers to Platform Providers

We are seeing evidence of an important pattern: the creation of internal service platform to meet the data science and analytic needs of organizations.

EDD WILDER-JAMES

May 9, 2017

Making Spark and Kafka Data Pipelines Manageable with Tuning

In this post, we’ll walk you through how to use tuning to make your Spark/Kafka pipelines more manageable.

LARRY MURDOCK

March 28, 2017

Spark Summit: Ignition in the Enterprise

We are excited to announce for Spark Summit 2017 in San Francisco, Edd Wilder-James will be joining Reynold Xin as co-chair of the Spark Summit program.

EDD WILDER-JAMES

March 22, 2017

Structured Streaming in Spark

This post gives you a quick overview of the new structured streaming feature in Spark 2.0, illustrating why it’s an exciting addition.

ANDREW RAY

July 28, 2016

Building a Prediction Engine using Spark, Kudu, and Impala

In this post, Richard walks you through a demo based on the Meetup.com streaming API to illustrate how to predict demand in order to adjust resource allocation.

RICHARD WILLIAMSON

April 12, 2016

Reshaping Data with Pivot in Spark

Andrew gives you a deep dive into pivoting data with SparkSQL. This piece was originally posted on the Databricks blog.

ANDREW RAY

February 16, 2016

Pivoting Data in SparkSQL

Andrew Ray, Senior Data Engineer, contributed to the most recent release of Spark. This post gives examples of how to use his pivot commit in PySpark.

ANDREW RAY

January 5, 2016

Archive for the ‘Spark’ Category

Sign In