In addition to our client work, we’re always doing R & D to hone our skills and indulge our curiosities. Here are some of our latest projects.
Format Wars: From VHS and Beta to Avro and Parquet
There are different data formats available for use in the Hadoop Distributed File System (HDFS), and your choice can greatly impact your project in terms of performance and space requirements. We looked at the details on several different data formats, including characteristics and an overview of their structures, and ran some tests on sample datasets.VIEW FULL R & D PROJECT
What is Your Data Worth?
Despite the volume of money being invested in data and data technology, methods for answering this question are severely lacking. SVDS is exploring the concept of data valuation, the characteristics of data that make it unique from other economic goods, and practical considerations for how enterprises can begin thinking about the general value of data.VIEW FULL R & D PROJECT
Better Know the Districts
During the government shutdown in October 2013, comedian Jon Stewart offered a hypothesis for the cause of government gridlock: gerrymandering. How has the method for defining congressional districts impacted the House of Representatives? Does it lead to gridlock? Who does Congress really represent? We decided to take a closer look, from a data science perspective.VIEW FULL R & D PROJECT
Listening to Caltrain
Many people who live and work in Silicon Valley depend on Caltrain for transportation — including us. And because the SVDS headquarters are close to a station, Caltrain is literally in our own backyard. As an R&D project, we have been playing with data science techniques to understand and predict delays in the Caltrain system.VIEW FULL R & D PROJECT
The History of Rock
We enriched data collected by The Guardian with band information from Last.fm, song characteristics from Echo Nest, and band-to-band influences from Music Bloodline. Abstractly we were exploring a network of collaborative product development over time, analogous to other multi-dimensional datasets of relationships. The final visualization presented here unites a network graph and a timeline into a cohesive view of influences on the one hand, and an understanding of time series on the other.VIEW FULL R & D PROJECT