
5 Reasons Why Spark Matters to Business
It’s been hard to miss Apache Spark in the last year. Many systems integrators, including ourselves, have also been enthusiastic about it.
It’s been hard to miss Apache Spark in the last year. Many systems integrators, including ourselves, have also been enthusiastic about it.
Hadoop is only beneficial if using it is efficient. Hadoop’s Apache Hive is frequently used to handle ad-hoc queries and regular ETL workloads.
In this post, we walk through an example of using Docker to develop a data acquisition pipeline to ingest mobile app GPS data using Kafka and HBase.
It gives us great pleasure to announce that a key member of the Spark team, Professor Michael Franklin, has joined our advisory board.
An important aspect of a modern data architecture is the ability to use multiple execution frameworks over the same data.
Graphite is a tool that does two things rather well: storing numeric time-series data (metric, value, epoch timestamp), and rendering graphs of this data on demand.