
Reshaping Data with Pivot in Spark
Andrew gives you a deep dive into pivoting data with SparkSQL. This piece was originally posted on the Databricks blog.
Andrew gives you a deep dive into pivoting data with SparkSQL. This piece was originally posted on the Databricks blog.
Check out the slides from our recent presentations at Data Day TX and Graph Day.
How can you manage your implementation in a way that allows you to take maximum advantage of technology innovation as you go, rather than having to freeze your view of technology to today’s state and design something that will be outdated when it launches? You must start by deciding which pieces are necessary now, and which can wait.
Andrew Ray, Senior Data Engineer, contributed to the most recent release of Spark. This post gives examples of how to use his pivot commit in PySpark.
Our audience of engineers got right into the guts of Spark’s GraySort benchmark win last year with Chris Fregly from IBM Spark Technology Center. Here are a few highlights from the meetup.
While on paper it should be a seamless transition to run Impala code in Hive, in reality it’s more like playing a relentless game of whack-a-mole. This post provides hints to make the transition easier.
Rather than get bitten by the idiosyncrasies involved in running Spark on YARN vs. standalone when you go to deploy, here’s a way to set up a development environment for Spark that more closely mimics how it’s used in the wild.
We present here some best-practices that SVDS has implemented after working with the Jupyter Notebook in teams and with our clients.
The Apache Spark big data processing platform has been making waves in the data world, and for good reason.