This post gives you a quick overview of the new structured streaming feature in Spark 2.0, illustrating why it’s an exciting addition.
Archive for the ‘Spark’ Category
In this post, Richard walks you through a demo based on the Meetup.com streaming API to illustrate how to predict demand in order to adjust resource allocation.
Andrew gives you a deep dive into pivoting data with SparkSQL. This piece was originally posted on the Databricks blog.
Andrew Ray, Senior Data Engineer, contributed to the most recent release of Spark. This post gives examples of how to use his pivot commit in PySpark.
Our audience of engineers got right into the guts of Spark’s GraySort benchmark win last year with Chris Fregly from IBM Spark Technology Center. Here are a few highlights from the meetup.
Rather than get bitten by the idiosyncrasies involved in running Spark on YARN vs. standalone when you go to deploy, here’s a way to set up a development environment for Spark that more closely mimics how it’s used in the wild.
The Apache Spark big data processing platform has been making waves in the data world, and for good reason.