Valuing Data is Hard

Data valuation methods | November 10th, 2015

“How much is my data worth?” is a question many business leaders are beginning to ask. Some have seen news of the most recent multi-million data exchange, and want to know if their company is sitting on golden data that outsiders are eager to mine. Others have been paying attention to any one of the many data breaches that have recently occurred, and want to know whether their data is worth whatever it might cost if they lost it. Yet others simply want to benchmark the value of their data to the business so that they can quantify internal decisions over time and try to increase that value.

At SVDS, we work with our clients every day to help them get better value out of their data. But working across multiple industries and toward many kinds of business goals, we were interested to know: just how can one put a price tag on one’s data? To investigate this question, I spent some R&D time between client projects reviewing how others currently approach the process of valuing data. I found that—despite the volume of money being invested in data and data technology, and the number of companies that facilitate the buying and selling of data—methods for answering the question What is my data worth? are severely lacking.

This article is the first in a series that I will be posting on the topic of thinking about data as an intangible asset, and how to value it as such. In this first post, I will outline the methods used to value other intangible assets, and then address why none of these methods is straightforward to execute on data.

Intangible asset valuation methods

Traditionally, intangible assets are valued through methods that can be categorized as follows:

Cost-based: value is determined based on how much the asset cost to create. This method is highly imprecise for data, because data is often created as an intermediate product of other business processes. Cost could be estimated based on cost of storage and other data infrastructure, but this would not capture the full value of the data nor reflect the differences in value between vastly different data sources.

Market-based: value is defined based on the market price of comparable goods on the market. In most cases, comparable data sources are non-existent. A market would require a consistent notion of what makes data more or less valuable. Moreover, even if there were a well-formed data market, data sources rarely have the same content or quality. In a future blog post, I will discuss data monetization further.

Income-based: value is defined based on an estimate of future cash flows to be derived from the asset. This approach may be useful for valuing data for a very specific use, the utility and execution of which I will discuss in a future blog post. However, it is much more tricky for valuing data generally within the organization, or for monetization.

Income-based valuation is the only type of method that makes sense for data—but its unique characteristics make its execution difficult. I will take the rest of this blog post to dive more deeply into the unique characteristics of data that make it so hard to value.

Note: for some, your business product might be data itself, in which case the valuation process is slightly more straightforward. For most, however, data is a crucial means to some other final product. For the purposes of this post, I will address the latter — and much more frequent — situation.

Why is data’s value so difficult to quantify?

To understand the different characteristics of data that make it hard to value, I have made the data valuation chain (below), which is based on the data value chain, and shows the progression of data products that are created in deriving value from data.

The data valuation chain shows that raw data has many intermediate forms as it is collected, processed, integrated, combined, and transformed with context to produce actionable insights, which can lead to action and, potentially, value. Using this valuation chain, we can then identify some of the reasons why valuing data is so tricky:

Data’s value increases as it moves through the data valuation chain. Like many goods, intermediate data products are valued based on the potential value of the end-product. As data progresses through the chain, its value increases towards the end-product value. Therefore, data’s valuation depends on where in the chain it lies. Separating an intermediate data product’s value from the value of completing the chain is delicate. The integration of multiple datasets can be worth much more than the linear sum of their individual values combined. Furthermore, a powerful machine learning algorithm, statistical, or mathematical model can exponentially increase the value of a data source, but valuing such analysis apart from the data it’s used on is just as imprecise as valuing the data itself.
The valuation chain must be completed to realize value from data. In economics, traditional goods are described as rivalrous because their marginal cost of production is non-zero. Information goods, however, are non-rivalrous. They typically have first copy costs, meaning that it is very expensive to produce the first copy, but marginal costs approach zero. Therefore, a lot of money has to be invested up front to derive value from data, which is risky when the value of the end-product is not definite. Quantifying that risk within a data’s value is complex.
Many possible value chains can originate from the same raw data. Like data, crude oil must go through a value chain to result in a usable product. Crude oil, however, has a defined, relatively small number of possible end-products whose market values are well defined and are driven by the demand and supply in the market. Raw data has an indefinite number of possible end uses that depend on the user and intention, and can change over time. A retail company could use aggregated GPS data to inform the location of its next store, while a city government could use it to understand how to better plan its roads. Both of these uses would require integration with different data sources and different types of analysis, and would lead to largely different end values. This aspect of data makes it especially hard to value for monetization through income-based methods, as the possible uses—and therefore potential income—can change drastically among different buyers.
The valuation chain could be completed only to result in no value at all. In economics, traditional goods are transparent because the buyer knows what they are getting before agreeing to purchase. Data and other information goods, however, are experience goods, meaning that their value can only be determined after their use. Therefore, one could purchase data and invest in the first copy costs required to complete the value chain only to realize that there was no value in that data.
Many different data valuation chains can be completed with the same original raw data. Data is non-subtractable, meaning that its use does not prevent other additional uses. In an organization, the same data source could be used for multiple analysis and decisions by many users, so the value of any particular data may prove highly variable across these multiple uses. Limitations on use could be placed by requiring licenses for data access, but in practice this typically does not prevent its dissemination in reports and tables to other teams in the company for decision making.
Different valuation chains can require different levels of data quality. Data quality can be measured along a number of dimensions such as accuracy, completeness, breadth, latency, and granularity. Different types of analyses require different levels of quality along each of these dimensions, and high-quality data for one use may be low-quality data for another. Data from the stock exchange that is cleaned and has outliers removed may be extremely valuable to long-term financial modelers, but inappropriate for fraud detection.
A completely different raw data source could provide the same insight and result in the same action. We use data to fuel key business processes and decisions but, at the end of the day, we rarely care about what that data is—just about the insight that it provides. GPS data aggregated by phone providers could provide information on how busy an area is to inform where to locate a store, but so could satellite photos. The value of a data source depends, therefore, not only on its potential end-use, but also on whether a substitute data source could be used instead.

Where to go from here

While the unique characteristics of data and numerous number of potential use make quantifying its value hard to do in absolute terms, that doesn’t mean it doesn’t have value! It simply means we have to be intentional in how we consider its value. While we may not be able to say “this dataset is worth X dollars” anytime soon (or possibly ever), we can consider data’s value for specific uses, or along specific dimensions; we can be informed as to what increases or decreases data’s value within the organization; and we can understand the considerations necessary for buying or selling data.

Later in this series, I will discuss methods for valuing a potential data source to determine whether it is worth purchasing. I will also outline methods for monitoring data’s value within the organization, which is they key for increasing its value over time. Finally, I will take a look at the landscape of buying and selling data, and discuss important considerations to be taken into account when thinking about selling your data.