Projects

In addition to our client work, we’re always doing R & D to hone our skills and indulge our curiosities. Here are some of our latest projects.

Data Formats

Format Wars: From VHS and Beta to Avro and Parquet

There are different data formats available for use in the Hadoop Distributed File System (HDFS), and your choice can greatly impact your project in terms of performance and space requirements. We looked at the details on several different data formats, including characteristics and an overview of their structures, and ran some tests on sample datasets.

VIEW FULL R & D PROJECT

What is Your Data Worth?

Despite the volume of money being invested in data and data technology, methods for answering this question are severely lacking. SVDS is exploring the concept of data valuation, the characteristics of data that make it unique from other economic goods, and practical considerations for how enterprises can begin thinking about the general value of data.

VIEW FULL R & D PROJECT

Better Know the Districts

During the government shutdown in October 2013, comedian Jon Stewart offered a hypothesis for the cause of government gridlock: gerrymandering. How has the method for defining congressional districts impacted the House of Representatives? Does it lead to gridlock? Who does Congress really represent? We decided to take a closer look, from a data science perspective.

VIEW FULL R & D PROJECT

Listening to Caltrain

Many people who live and work in Silicon Valley depend on Caltrain for transportation — including us. And because the SVDS headquarters are close to a station, Caltrain is literally in our own backyard. As an R&D project, we have been playing with data science techniques to understand and predict delays in the Caltrain system.

VIEW FULL R & D PROJECT

The History of Rock

We enriched data collected by The Guardian with band information from Last.fm, song characteristics from Echo Nest, and band-to-band influences from Music Bloodline. Abstractly we were exploring a network of collaborative product development over time, analogous to other multi-dimensional datasets of relationships. The final visualization presented here unites a network graph and a timeline into a cohesive view of influences on the one hand, and an understanding of time series on the other.

VIEW FULL R & D PROJECT

Sign up for our newsletter