How to Value Data {when you want to license it}

How to Value Data {when you want to license it}

It’s no secret. Firms with extensive data assets are looking to license their content to others. Entering the world of data licensing is confusing, even for experienced professionals. The first question I'm asked by companies that engage me is this: "What do you think my data is worth?"

You can imagine the range of emotions any company generates after it receives an inquiry from a bigtime east coast hedge fund. They first believe they've hit the jackpot. I can tell you, it is possible to get high dollar values from hedge funds. In order to get the most you'll need to follow my rules below. Even if the fund uses your data in a strategy, it might not be the biggest factor in generating returns; which will put a damper on the earning potential of the data. These rules can help you. They allow you to answer the valuation question and more.

Robert’s Rules of Data Valuation (hey I got four and they have nice pictures)

The Value of Data over time.

No alt text provided for this image

When firms have data that updates in realtime I usually point them in this direction. If the data is for equity or asset trading the timeliness of the updates is what matters. The graph represents the latency curve with respect to data. Initially data for market uses has the property of being more valuable within the initial stages of distribution to the user. As the data diffuses into a market the associated value with respect to asset prices it may influence, becomes priced in to the market. At some point on the curve, which may be over milliseconds, minutes or days, the value of the data falls to its lowest point. This is marked by the red dotted line. For those markets it has less impact in moving an asset price. However, accumulating a large history adds value back to the dataset. When accumulated data becomes large, its historical value as a testable archive will usually rise. This typically starts when you have more than two years of history. I find that users outside of finance become interested in the historical data sets especially as it relates to industry or competitor benchmarks. This is a common use for corporate strategy units and competitive intelligence functions.

Comprehensiveness of the data makes it more valuable over time. Value can be extracted after its initial use as a time-sensitive play. One thing to keep in mind product-wise; if your data is valued on it's latency, when dealing with trading customers, the archive needs to be included for backtesting. Pricing should encompass the archive and it should be included in the pricing of the real time data. These products are all feed based. Large archives are also necessary, meaning lots of data points. The more data points you have, the easier it is for an organization to divide the data set into meaningful partitions for testing, model creation. and validation.

The Value of Data given Accuracy.

No alt text provided for this image

The cleaner or more accurate the data is, the easier it is to extract value. In short, it becomes more useful to the client and easier to use. Depending on the sources of raw data, the rate of corrections will allow the client to effectively price the effort involved in using the data. Highly accurate data therefore can either maintain or increase value. Once accuracy falls below a certain level, it can become a liability. This will lessen its value. The two most expensive tasks in data management are the cleaning and structuring of data sets for use.

If the client application of data is related to machine learning exercise, labeling is another cost. It's usually why smart providers will process the data and add appropriate metadata tags. Adding categories, topics or detailed entities increases the usefulness of a data set. This way the data can be split in multiple ways.

Accuracy also comes into play when creating derived data (additional calculated metadata). An easy example of this is when clients want to understand how updates to the data are managed. If you keep only the current state and throw away the updates, in time the data becomes less valuable. If you publish that data publicly and it has a market impact, many firms want to see all the changes and states of the data. This element is known as point in time.

As a rule quantitative based strategies are very concerned with point in time data. Point in time also refers in other cases to metadata you may create. A big no-no is forward looking updates applied backwards in time. For example, you update the algorithm to derive a field and then apply it backwards to your previous archive. You can do that as long as you keep the previous algorithm running and the original state of the historical data. In practice you'll have two fields running the old and new algorithm. This allows current customers to migrate gracefully over time instead losing any signal they may have developed.

The Value of Data Integration.

No alt text provided for this image

Data that can be easily combined with other sources makes for a highly valuable proposition. This one parameter can add significant value to a data set in many ways. For financial markets tagging data sets to common company identifiers allows for easier data joins. Using any of the standard identifiers like, ISINs (S&P), PermID (Refintiv) or OpenFIGI (Bloomberg) to map companies, products and brands in your data set make it much easier for a potential client to evaluate your data. That one simple step can make your sales process much smoother since you’re saving time for a potential customer.

Another is time saving device is formating clean timestamps in a common format like UTC. These simple preprocessing steps can make it much easier for a potential client to find value in your data set since it makes the ingestion process for data evaluation faster. Additionally, data sets that have been reviewed for index coverage also make the evaluation process better. By presenting a summary distribution of your content mapped to companies and then compared across multiple market indices like the S&P 500, MSCI or Russell 2000; you can significantly increase the value of a data set. 

The Value of Data with Usage (Distribution).

No alt text provided for this image

There are two base cases in which data can attain higher values. The first is represented by the blue line. In this case the data’s value increases as more firms use it as part of their analysis. In this instance price setting at the appropriate level expands the market and the data can be thought of as a necessary component of the market conversation. Most market data sets fall in this bucket. These data sets can be described as utilities or indicative data that all participants need and many vendors provide. Data sets like these are usually sold to many participants.

The second case represented by the green dashed line, can also have high value by maintaining higher prices over a smaller base of users. The data may be thought of as exclusive in nature and the associated value is tied to the share of profit it can provide to a few firms. Several high quality datasets have been sold this way while making their jump to being a semi-commodity: PMI’s (Purchasing Managers Index), KPIs’ (Key Performance Indicators) and certain credit card transaction sets. 

The exclusive nature of data sets has dwindled as hedge funds worry about being the only firm in possession of that data set. This may lead to an issue of fair disclosure (FD) or insider information depending on the data set. The fine line in data is that enough distribution needs to take place in order for the information to be valuable at some point in a market. A piece of information that doesn't defuse somehow into the market never moves a asset price. Ultimately information needs to be represented in the price. It is the timing differences among parties in that diffusion which causes the greatest price movements.

The next question you may have is, how do I price these data sets?

Pricing is the most difficult decision when it comes to selling data. By market example, most data sets are sold in the sub $100k per year range. At less than the cost of a junior employee, the decision to license is easier and more opportunistic. Once prices cross $100K the decision process becomes more involved as does the evaluation period needed for the purchaser. I’ll have more to say on pricing in a future post - so stay tuned!

Christopher Pagano

Alternative Data Sourcing and Customer Success Professional

3y

Very helpful article, Robert Passarella. We get these same questions regarding data valuation and pricing, I like how you've broken this down. Looking forward to your thoughts on pricing!

Bill Genovese

CIO Advisory Partner | CTO | Technology Strategy | Corporate Strategy Innovation Selection Committee Member |AI & ML | Senior/Principal Quantum Computing Team Leader

3y

Especially like this and agree: “Comprehensiveness of the data makes it more valuable over time. Value can be extracted after its initial use as a time-sensitive play. One thing to keep in mind product-wise; if your data is valued on it's latency, when dealing with trading customers, the archive needs to be included for backtesting. Pricing should encompass the archive and it should be included in the pricing of the real time data. These products are all feed based. Large archives are also necessary, meaning lots of data points. The more data points you have, the easier it is for an organization to divide the data set into meaningful partitions for testing, model creation. and validation.”

David A. Frankel 📈 🏗️🦅

Commercial Growth Strategist | Partnerships | GTM Executive | Sales | Marketing

3y

This a great opening act for the headliner post, Robert Passarella.

Excellent Post Robert Passarella. Spent years working one-on-one with data clients that we were bring from the Government to Capital/Corporate Markets and this is question Number 1 to answer. 100% why I joined BattleFin Ensemble & Discovery Day Events to develop Ensemble with the goal to streamline process for data buyers/providers to explore and evaluate data to more efficiently determine data value.

Hassan el Bouhali

CIO | CDO | CTO | Board Advisor

3y

Great post Robert Passarella. Feels like these valuation rules (time, accuracy, integration, usage etc.) would equally apply to data generated by companies in heavy industries. Data about pumps, conveyors, robots, pipes, motors or any industrial good could be sold back to equipment OEMs. Data value would increase when cleaned making it accurate, augmented with equipment behavior under operations, continuously fed back to the OEMs as a stream, and archived over the years providing benchmark... Surprised by the low price though...

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics