Archive for the ‘Spark’ Category

Structured Streaming in Spark

This post gives you a quick overview of the new structured streaming feature in Spark 2.0, illustrating why it’s an exciting addition.

Building a Prediction Engine using Spark, Kudu, and Impala

In this post, Richard walks you through a demo based on the streaming API to illustrate how to predict demand in order to adjust resource allocation.

Reshaping Data with Pivot in Spark

Andrew gives you a deep dive into pivoting data with SparkSQL. This piece was originally posted on the Databricks blog.

Pivoting Data in SparkSQL

Andrew Ray, Senior Data Engineer, contributed to the most recent release of Spark. This post gives examples of how to use his pivot commit in PySpark.

Advanced Spark Meetup Recap

Our audience of engineers got right into the guts of Spark’s GraySort benchmark win last year with Chris Fregly from IBM Spark Technology Center. Here are a few highlights from the meetup.

Develop Spark Apps on YARN Using Docker

Rather than get bitten by the idiosyncrasies involved in running Spark on YARN vs. standalone when you go to deploy, here’s a way to set up a development environment for Spark that more closely mimics how it’s used in the wild.

Use Cases for Apache Spark

The Apache Spark big data processing platform has been making waves in the data world, and for good reason.

5 Reasons Why Spark Matters to Business

It’s been hard to miss Apache Spark in the last year.

Ignition Spark: Mike Franklin joins SVDS as Advisor

Everyone at the Strata + Hadoop World Conference in New York earlier this month came away with one strong message:

Sign up for our newsletter