February 26, 2024
Over the past year or so, it’s been hard to escape the impression that AI is becoming a massive disruptive force. AI, of course, is only as good as its data, and thus far the data sources undergirding the AI revolution haven’t been used to their full potential. Making good use of AI and machine-learning tools often entails having the right data strategy to support them.

In private equity and venture capital, firms have invested in using external data sources often referred to as “alternative data,” a broad term used to describe information sourced from outside a company’s internal systems, including social media chatter, news feeds, government reports, industry databases, anonymized credit card transactions, and satellite imagery. These data sources can fill in crucial context by giving investors an outside-in view of the company in question, and a broader perspective on the markets in which it is operating. Social media data, for instance, is full of consumer sentiment and behavior trends. This information can be used to understand the public conversations about specific companies or sectors, helping investors to take the pulse of the market in real time, identifying trending topics, tracking brand health, and monitoring consumer feedback.

We have both used this approach in our work. We’ve also spoken with data science leaders and practitioners across a number of private investment firms about how they’ve deployed this strategy.

In many cases using external data is still an expensive and lengthy process. Neither the data itself, nor the analytics-savvy employees needed to make sense of it, are cheap. But for private investors eager to stay on the cutting edge, there are significant opportunities: from identifying potential investment opportunities and conducting due diligence, to adding value post-investment. While these approaches have been honed by investors, they also offer models for how companies across industries can use alternative data.

Algorithmic Sourcing

When looking for new investments, the ability to creatively distill signals from novel and often noisy data sources can give private investors an advantage. With the right external data, firms can deploy algorithms to detect patterns, trends, and anomalies that may point to attractive investment propositions, or M&A opportunities for existing portfolio companies. These signals may include factors like above-average growth rates, successful market entry, or the development of an innovative new product.

A few years ago, a PE firm was searching for opportunities in the retail sector. Using a data-first approach, they parsed anonymized credit card data across different sub-sectors of interest. Ultimately, they homed in on a surprising trend: significant increases in both spend and the total number of buyers in the houseplant category.

Seeking to deepen their understanding of this market, they examined how these increases split across physical and online purchases. To their surprise, they found that most of the growth was attributable to the latter category — more and more people were buying their plants on the web. Aiming to identify the top-performing companies in this space, the team then sourced web traffic data, and examined it alongside the credit card data, ultimately putting together a shortlist of companies that showed particular promise. A few months later, after direct conversations with company leaders and thorough due-diligence, the PE firm bought an ownership stake in the company that appeared to be the leader of the pack — an up-and-coming e-commerce company that was well-positioned to capitalize on an overlooked trend.

In this case, the analyses themselves were fairly rudimentary: It was the data itself that made the difference.

Data-Driven Due Diligence

Due diligence is a manual, painstaking, but necessary task in the world of private investing, and as such is particularly rife for disruption by external data. Due diligence is constrained by both time and resources. Ideally, one would want to examine a wide range of a company’s characteristics before investing, and gain insights into its workforce, critical supplies, key products and capabilities, and historical performance. But often, there is only enough time for a cursory glance at each of these parameters.

Moreover, the data sources used to conduct due diligence have traditionally been internal. In assessing a retailer, for example, one might initially analyze its historical customer and sales data. External data can help contextualize this type of data, providing valuable information on competitor performance and on customer characteristics. While some of the data used for due diligence has always been external, there is still vast untapped potential that can be deployed to assess a company’s human capital and intellectual property, public sentiments, and growth potential.

Consider the case of a VC firm evaluating an investment in a pharmaceutical company. Traditionally, the firm would have a few weeks to gain a better understanding of a company, as they parse internal company financial information and conduct market analysis to better understand the competition. But in this case, the firm went further, deploying large language models (LLMs) and other AI tools to analyze patent databases to assess the company’s research pipeline, clinical trial data to gauge the potential success of its drugs, and news articles and social media to understand public sentiment towards the company. It is almost impossible to do this at scale using conventional methods.

Although external data can make the difference between a go or no-go decision, analysis of external data will not always be the determining factor in making an investment. Often, it will be a matter of increased (or decreased) confidence. But due diligence driven by external data offers a more thorough and nuanced analysis compared with traditional methods, unearthing insights that might otherwise have been missed, all while expediting the process. In competitive bidding processes, this can give an investor a significant advantage.

Portfolio Company Value Creation

The benefits of external data in private equity and venture capital are not limited to investment discovery and evaluation. Once an investment is made, external data can be harnessed to better frame the performance of a portfolio company, in context with emergent trends, shifts in consumer behavior, and competitive dynamics. This information can also guide portfolio companies, empowering them to make more informed decisions.

For example, a VC firm recently developed a talent sourcing tool to help their portfolio companies identify and recruit uniquely skilled professionals. By bringing together powerful professional databases including Google Scholar, GitHub, Quora, Kaggle, and LinkedIn, the firm is able to leverage multiple professional perspectives simultaneously, in order to pinpoint high-value individuals with the precise skills and experiences it needs. For instance, the firm was able to use this tool to identify seasoned project managers with an established background in the renewable energy sector, who have overseen projects in wind-energy generation, and have experience navigating the regulatory landscape in the Pacific Northwest region of the United States. This ability to source specialized talent is proving to be a game changer, enabling portfolio companies to assemble strong teams and achieve their ambitious goals.

The Future of External Data In Private Market Investing

As people and organizations adopt more digital technologies, they produce more and more data. Effectively harnessing these vast amounts of external data sources is no small feat. It requires not just the technology to collect and process them, but also the ability to filter the signal from the noise. Discerning what is relevant in an ocean of data is quite a challenge. This is where artificial intelligence shines: with sophisticated algorithms and machine learning models, AI can identify patterns and insights that would be nearly impossible for humans to detect on their own.

State-of-the-art LLMs such as GPT or Claude can extract meaningful insights from previously unattainable input sizes. The “context window” of LLMs – that is, the scale of data they can meaningfully parse and make sense of – is sure to grow in the months and years to come. This means we can expect ever-more sophisticated algorithms capable of extracting meaningful insights even from messy and unstructured datasets that would have been difficult to make sense of in the past.

These developments will further expand the range of potential applications of external data. For example, responsible investment factors are becoming increasingly significant in investment decisions. External data – for example, as a means for inferring carbon emissions – could be used to help assess a company’s performance against relevant metrics, aiding stakeholders, including investors, funds, management teams, and employees in making their decisions.

Still, extracting value from external data poses significant challenges. Understanding the external data environment, for example, which is highly fragmented and uneven in quality, demands dedicated resources and expertise, as does effectively navigating through the thousands of data products and vendors available on the market. Identifying the data most likely to be useful is a hugely important part of the process. Additionally, in order to operationalize external data, companies may need to update their existing data environment, a process which may involve expensive and disruptive changes to systems and infrastructure. Finally, like any powerful tool, external data requires skill and experience to be effectively deployed. This process calls for advanced analytical capabilities, accurate interpretation of results, and a comprehensive understanding of the limitations of the data and the algorithms used, all of which highlight the importance of putting together a high caliber data science team. Yet keeping data science teams up-to-date on the latest technologies, in a fast-moving field, is itself a significant challenge, as is retaining talent.

As reliance on external data grows, so do concerns about data privacy, security and bias. Different types of data may be subject to different rules, and different states and countries have different regulatory frameworks in place. Some firms, moreover, rely on web-scraping to source much of their data, a practice which can lead to dubious data provenance. Thus, it is paramount to certify that data are sourced and used ethically and responsibly.

For those private market investors willing to embrace external data, the rewards can be significant. By harnessing the power of external data, they can gain a crucial advantage in an increasingly competitive market, driving success for themselves and their portfolio companies. The age of external data in private market investing is just beginning — and the opportunities are vast.