The data revolution in venture capital

Feb 7, 2024

Investors, data scientists, and tool builders leading the data-driven future of venture capital.

Data-driven algorithmic decisions now make up 75%+ of public market trades — an AUM of $1+ TRILLION.

But that wasn’t always the case.

Hedge funds drove this revolution back in the 90s, leveraging the parallel explosion of available datasets and computing power.

Now, 30 years later, the data-driven revolution is making its way into the world of venture capital.

It’s happening fast. It’s estimated that more than 75% of VC deal reviews will be informed using AI and data analytics by 2025. VCs increasingly leverage data for sourcing, evaluating, and managing their investments.

In this post, we discuss why the timing is right for data-driven ventures, the specific strategies being used by funds today, and the trade-offs to consider. As always, you’ll hear directly from practitioners at the edge of this field. 

If you only have a few minutes, here’s the TL;DR:

  • Timing is more opportune than ever for data-driven venture. We’re seeing an explosion of data on private market companies, advancements in analytics capabilities driven by improvements in AI/ML, and increased availability of off-the-shelf software.
  • Data is becoming core to the competitive advantage for more funds. EQT’s Motherbrain has helped them make investments valued over €200M+. Tribe Capital’s Termina utilizes extensive benchmark sets to offer unique insights to founders that are otherwise unattainable. 
  • Most VC funds balance machine-driven and human-driven approaches, using data alongside qualitative assessments. For some funds we spoke with, such as the AngelList Early-Stage Quant Fund, data-driven investing is core to their strategy. Most funds use a hybrid human-machine approach. An over-reliance on data may miss the broader point and lack creativity in identifying outliers in people or opportunities.
  • VCs are using data to gain a competitive edge in deal sourcing. Data-driven approaches can identify promising startups, trends, and market gaps that might be overlooked using traditional methods. Data can also help with tracking a large volume of startups, automating follow-up triggered by information changes, and increasing the number of touchpoints with companies.
  • Some VCs are also using data-driven diligence to speed up diligence and make more informed decisions. Data-driven due diligence processes involve analyzing a startup's financial metrics, market trends, competitive landscape, and other relevant data points to predict its potential for success better. 
  • There are trade-offs to data-driven VC. Pros include increased scalability, greater efficiency, and the ability to build competitive advantages. Cons include high costs, a reliance on the availability and quality of data, a false sense of security, and a narrow focus.

Shout out to Abe Othman (AngelList Early-Stage Quant Fund), Jamesin Seidel (Chapter One), Lan Xuezhao & Wilson Kyi (Basis Set Ventures), Kaushik Subramanian (EQT Ventures), Jake Kupperman (Level), Haley Bryant (Hustle Fund), Francesco Corea (Greycroft), Jonathan Hsu (Tribe Capital), Rob Kniaz (Hoxton Ventures), Damian Cristian (Koble), Max Ruderman (Harmonic), Alex Chee (Termina), and Andrea Wang (General Catalyst) for contributing to this post.

Why timing is right for data-driven VC

Venture in an age of data surplus and advanced analytics capabilities

Just 10 years ago, data on startups was scarce. Public registers weren’t digitized, and startup databases had a fraction of today’s active users and low market coverage. In a data-poor paradigm, venture investing primarily happened through networks.

There is now a vast amount of publicly available data on startups generated on the internet — Crunchbase, employer reviews, app launches, Product Hunt, GitHub, LinkedIn, etc. VCs also subscribe to more gated data services that provide information on market trends, consumer behavior, and competitive intelligence. Lastly, many funds are investing in building out their own proprietary datasets, partially to compete in an increasingly crowded market.

This explosion in data availability has been accompanied by the development of analytics capabilities. We now not only have the datasets we need, but we can also surface signals from them.

“By structuring massive volumes of information into usable datasets and pairing them with groundbreaking proprietary algorithms, we can transform the process of sourcing, evaluating, and investing in startups.” 

— Damian Cristian, Koble

This has created an inflection point in private markets.

“For the first time, the underlying technological conditions are right for machines to beat humans at early-stage startup investing. By applying the same quantitative strategies that have disrupted public markets to early-stage startup investing, VCs can smooth the distribution of returns, creating risk-adjusted performance that crushes traditional human-centric funds.”

— Damian Cristian, Koble

Some expect data to change venture into a machine-driven asset class

Data use could go beyond the current trend of “data-driven” investing which involves a strong human component to a machine-driven process.

“It creates a systematic and non-human startup investment process, evolving the asset class and unlocking massive value for founders, investors, and society.”

— Damian Cristian, Koble

Plus, with more startups being funded than ever before, data-driven decisions are increasingly valuable. 

“Historically, early-stage deal sourcing happened almost entirely through networks and referrals, meaning exciting opportunities outside the confines of VC networks were left undiscovered and undercapitalized. 

As big tech layoffs abound and hiring slows, concurrent with a Cambrian explosion in AI, more startups are being founded than ever before. 

“If your firm doesn’t leverage data to find and monitor startups (think traction metrics like headcount and website traffic growth, funding history, team composition, and more), you’ll miss countless opportunities and fall behind your data-driven peers.”

— Max Ruderman, Harmonic

The next question, of course, is how.

How VC Funds use data to source, evaluate, and make investments

We asked investors how they’re using data to source, evaluate, and make investments. Here are the common themes across their replies:

Data-driven sourcing methods are giving investors visibility into companies and founders that they would’ve missed with more traditional methods. Data-driven sourcing supports automation, including follow-up with relevant companies and scoring startups based on predefined criteria.

Data is being scraped from internet sources (Crunchbase, X, LinkedIn, etc.) and private correspondence (like emails, texts, and decks). Some funds have also explored non-obvious data sources beyond the conventional ones like LinkedIn, GitHub, and Crunchbase. Basis Set Ventures collects founders' cognitive and behavioral traits. Level utilizes data science to uncover hidden investor and talent networks through predictive algorithms.

Funds are building internal data-driven investment capabilities by investing in dedicated teams working on custom-built platforms, and by simply tweaking off-the-shelf LLMs. EQT Ventures mentions having a dedicated team of engineers, data scientists, and product managers who have been building their in-house AI product called "Motherbrain" since 2016. Tribe Capital has its own data science spinout called "Termina," which is pivotal in the firm's data-driven approach.

Financial models and predictive algorithms are popular; these are often assessed against public comps.

While deal sourcing is the most common use case, some funds are leveraging data primarily for due-diligence and portfolio management. 

Qualitative diligence and backtesting are crucial for data-driven approaches. Let’s dive into fund-specific strategies.

Let’s dive into fund-specific strategies.

How Basis Set Ventures uses non-obvious datasets

“We’ve built a lot of systems that help with scalability and consistency. For example, we incorporate obvious and non-obvious data sources to source and understand founders. This is beyond the obvious data from LinkedIn, GitHub, Crunchbase, etc. 

We collect founders’ cognitive and behavioral traits. We've published some research to try to understand different founder archetypes (Founder Super Powers) and even experimented with the most recent LLMs to help us (Can LLMs predict founder success?).”

— Lan Xuezhao & Wilson Kyi, Basis Set Ventures

How EQT Ventures’ in-house Motherbrain helps make investments

“We have a dedicated team of engineers, data scientists, and product managers that have been building our in-house AI product 'Motherbrain' since 2016. 

Motherbrain allows us to source, evaluate, and manage investments. It’s probably our hardest-working employee. It has several data sources, is able to indicate who might start a company, key metrics that might not be intuitively available, and so on. 

It’s a transformative approach to assessing investments and has helped EQT Group to make 15 investments to date, to the value of €200M+.”

— Kaushik Subramanian, EQT Ventures

How Level uses data science to uncover hidden talent networks

“Data science is foundational to our firm. We launched Level as the natural evolution of work that our team members had been doing for years: namely, applying graph and network theory to private markets and industry.  

Our seven data scientists and engineers work closely with our investment team at each step of the investment cycle, from identifying potential opportunities — both those already in market and those that may not yet exist — and evaluation, to winning allocation and supporting managers post-investment. 

Much of our work revolves around uncovering hidden investor and talent networks through predictive algorithms to guide our decision-making. We validate our work with backtesting and qualitative diligence.”

— Jake Kupperman, Level

How Hustle Fund screens a high volume of startups 

“We are lucky to have an incredible community of founders, angel investors, co-investors, and LPs that help us meet founders building something new.

We’ve built a workflow with Typeform, Pipedrive, Airtable, and Zapier that uses logic to quickly prioritize, review, evaluate, and follow up on the best opportunities within our pre-seed, software-enabled strike zone at scale.

The next step that BetterBrain.ai is helping with is thinking through how we can learn from previous decisions by surfacing and synthesizing learnings.

Data allows us to automate follow-up with companies that are clearly outside of our strike zone and supports an AI sparring partner that scores companies within our strike zone to help us quickly determine what to spend time digging into.”

— Haley Bryant, Hustle Fund

How Greycroft identifies some startup targets

“We combine several data sources to flag companies that show some signs of growth or are currently up-trending in one way or another. 

The systems highlight a certain number of companies and people on a weekly basis, and that information (with the whole panel of data points attached) is passed to the investment team, which does an extra cut and reaches out afterward to the founders in question.”

— Francesco Corea, Greycroft

How Tribe Capital uses data for diligence and portfolio monitoring 

“At Tribe Capital, we emphasize the use of AI and data science primarily for evaluation and continuous portfolio monitoring, rather than for deal sourcing.

This shift in focus stems from our understanding that as an investment firm's brand grows, sourcing becomes increasingly organic, with more companies seeking us out. However, the challenge of effectively evaluating these opportunities does not diminish with brand recognition. In fact, it becomes more crucial to discern the potential and risks of each opportunity accurately.

Our data science spinout, Termina, is pivotal in this process. We leverage our extensive proprietary benchmark set in our Termina instance to offer insights to founders that are otherwise unattainable. This includes providing a comprehensive analysis of how companies benchmark across product market fit, unit economics, team, and more, both positively and negatively. 

The goal is to harness the power of data science and AI not just in gathering opportunities but more importantly in meticulously evaluating them, ensuring informed investment decisions within the competitive deal timelines present in venture.”

— Jonathan Hsu, Tribe Capital

How AngelList Quant Fund uses proprietary datasets to source investments

“We reach out to the 15 early-stage startups each month that are attracting the most high-signal applicants on Wellfound (FKA AngelList Talent). We think there are maybe 15,000 early-stage startups hiring on Wellfound, so our outreach is to the top 0.1%.

We want to invest a small amount of money in as many of those companies as possible, which is itself a data-driven investment approach. Very meta.”

— Abe Othman, AngelList Early-Stage Quant Fund

How Hoxton Ventures uses data where they can add the most value

“Over the years I’ve created a network of servers I can remote control that help me scrape and collect data, and a set of data post-processing tools that go through and normalize the data and then interpret some of the raw data into higher level analyses like determining a person’s seniority level at a company.

I’m not building many of the models myself, I’m mostly using off-the-shelf LLMs and doing some tweaking. Frankly, for the cost, ChatGPT is pretty fine for many of the applications I’m doing.

I primarily use my own front end on top of this internal data store to surface people and companies in a live feed for the sourcing and origination side. So knowing where to find a needle in the haystack among all the companies. Likewise, I publish a front-end for founders to search to find candidates they may have missed through normal searches or otherwise not findable. This is a great utility, especially for roles with really specific candidate requirements.

On the latter side, in terms of helping founders with tools for their company, there's a lot of value there. Everything in the middle, it’s useful, it ties into the CRM, but generally speaking, the early identification of interesting companies to support the decision process — I think it’s the most significant thing in a fund that you can get augmentation for.”

— Rob Kniaz, Hoxton Ventures

How General Catalyst uses data to find new founders as early as possible 

“For sourcing on the seed side (where I focus), the goal is to find new founders as early as possible, so we like to use a variety of signals such as someone's past experiences, tenure, progression at their last roles to assess slope and entrepreneurial potential. We like to engage operators that we think could be great founders even if they haven't decided to leave or updated their online profile about starting something new yet.

While we evaluate investment opportunities, besides diving deep into any financial or usage data that founders provide us, we also independently look at data sources such as a company's historical web traffic or app store ranking, user reviews, market sizing reports, and benchmark growth and usage metrics against other similar companies in the space.”

— Andrea Wang, General Catalyst

Off-the-shelf tools for data-driven investing

Building internal tools, datasets and models is a popular strategy with the funds we spoke to. However, ready-to-use platforms can also provide off-the-shelf capabilities.

Harmonic, Koble, and Termina build software for VCs to source and evaluate investment opportunities.

Damian Cristian (Koble) on the importance of data evaluation and using stage-specific data sources

“When it comes to startup investing, everyone agrees that data is the future. 

But if you only use data that’s easily available, you will have no edge over the market. This is precisely what most ‘data-driven’ VCs are doing. Their outputs are only as good as their inputs, and those inputs tend to be run-of-the-mill datasets from mainstream databases.

The key to leveraging data in the startup investment process is investment in data acquisition, management, and deployment. Working with data without compromising its integrity and utility is a huge challenge, and the vast majority of VC firms are simply not set up to do this well.

Systematizing startup investing presents significant challenges when it comes to data availability, quality, and actionability. But for the first time in the history of our asset class, these challenges are surmountable.

Sourcing isn’t the issue. Evaluation is. You might be able to source 1,000 startups a week, but if you can’t evaluate them, what is the point?

So, for evaluation, you have to think about the stage. For us, VC is really two asset classes in one:

- Pre-Seed and Seed

- Series A+

At Pre-Seed and Seed, all the data you need to evaluate a company can be found in the public domain – team, market, investors, traction signals. At this stage you have to be willing to admit revenue doesn’t matter (a hard thing to do if you spent a previous life building financial models at an investment bank or a management consulting firm and you’re now trying to rebrand yourself as a VC).

At Series A+ it’s a company’s privately held data that really matters when trying to drive better investment decisions. So, you need to find ways to get hold of fresh data as it pertains to financials, growth, and marketing.”

— Damian Cristian, Koble

Max Ruderman (Harmonic) describes how their tool allows for superpowered sourcing

“Every VC knows the value of a great network. We believe that networks are actually under-leveraged, and the best investors out there combine their networks with real-time business data for superpowered sourcing.

Many data-driven investors using Harmonic have alerts set up so they instantly know when someone in the firm’s network is onto something new. 

Fin Capital—a Harmonic customer that became an investor just months into using the platform— also uses Harmonic to find warm paths into newly discovered companies. When anyone at Fin lands on a company they’re interested in, Harmonic shows who in the firm has connections with that company, along with the title and high-level information of those connections. Fan Wen, Principal & Head of Data Science at Fin, says that historically when an investor uncovered a compelling company, they’d ‘search LinkedIn for first or second-degree connections. That would take time. With Harmonic’s browser extension, we now do that in a single click.”

— Max Ruderman, Harmonic

Alex Chee (Termina) on how their tool can help funds overcome the “cold start” problem to data-driven VC

“I think it's clear at this point that the vast majority of investors across asset classes believe AI and data science can significantly enhance predictability and reduce risk. However, the adoption of these technologies has been hindered by significant barriers, notably the prohibitive costs, complexities and timelines involved in hiring engineering teams and deploying products that actually produce ROI. We've come to understand that this is a universal problem across not only VCs emerging to tenured, but also CVCs, sovereigns and other investors focused on the private markets.

At Tribe Capital, and now Termina, we have developed a solution that effectively addresses this 'cold start' problem in adopting AI and data science. Our AI software platform, equipped with one of the largest global benchmarking datasets ranging from seed stage to IPO, enables investors to adopt quantitative diligence swiftly. Our customers gain the ability to scan rapidly and benchmark companies within days of partnering with Termina — in stark contrast to the years typically required to build a comparable system from the ground up.

And that rapid adoption path pays off. It buys our customers the wins necessary to budget for and build further proprietary capabilities, and to do so at a fraction of the cost on top of their Termina Instances. In fact, our ability to dramatically accelerate the adoption of data science at our customers' firms has allowed some folks to go so far as to orient large parts of their firm's future value proposition around the capabilities Termina is able to provide them. As technologists at heart, we love this and are excited to play a role in accelerating the adoption of data science and AI globally.”

— Alex Chee, Termina

The pros and cons of data-driven VC

The pros

It’s more efficient, fairer, and removes human bias.

“Quant VC offers an interesting alternative to VC’s busted empathy model. By taking humans out of the equation, we clarify expectations. No TED talk platitudes; no ‘partnerships;’ no false empathy. Only algorithms – fast, quiet, hands-off, fair.

It’s a good choice for investors who understand the importance of making venture capital fair and inclusive for all. Quant models have no concept of gender, ethnicity, sexual orientation, religion, personality, and physical appearance. Our groundbreaking deep learning model recommends founders and startups that often differ from the traditional VC investment profile.” 

— Damian Cristian, Koble
“The biggest advantage [to data-driven VC] is that we get to meet a lot of very interesting startups doing interesting things. You can't get to the top 0.1% without doing something interesting. It's also a change of perspective because we're not getting a deck from a random company and trying to figure out what they're not telling you.”

— Abe Othman, AngelList Early-Stage Quant Fund
“A data-driven approach enables a focus on fundamentals, limiting bias and improving fund returns. A recent Berkley study found that data-driven VCs invest in a third more women founders and are less likely to anchor on signals like university attended that, like markups, look good on paper but don’t necessarily correlate to business performance.”

— Haley Bryant, Hustle Fund
“The advantage of data-driven sourcing is helping us narrow down the potential pool of (future) founders that we should focus on, and using data in the evaluation process helps with more standardization and discipline in assessing traction and PMF across companies.”

— Andrea Wang, General Catalyst
“Above all else, VC is a network business, effectively capped by the scalability of human relationships. There is a cognitive gap in how many sectors and companies an individual investor can deeply understand without the help of data and technology. Technology solves this limitation, enabling investors to source and screen huge deal flow volumes.”  

— Damian Cristian, Koble 
“The biggest advantage is speed and scope. We hope our data-driven approaches allow for faster, more rigorous decision-making, broader analysis, and, perhaps most crucially, expanding our networks to focus on high-potential opportunities.”

— Jamesin Seidel, Chapter One
“The biggest advantage has been the results. Motherbrain has sourced investments such as Anydesk and Peakon which have performed very well. In fact, Peakon was the first AI-driven exit too.”

— Kaushik Subramanian, EQT Ventures
"[Data-driven VC] is inevitable. As the cost and effort required to harness AI and data science approaches zero, it's hard to imagine *any* investor declining to harness that power to reduce their risk and increase the predictability of their investment strategies.

More and more firms are waking up to this, with a surge in awareness in the last 6 months. Those of us who have been early adopters of this transformation firsthand are leaning in to capture that first-mover advantage. And we're helping our friends and peers adopt those same practices as well, given reduced risk and increased predictability helps our entire asset class as a whole."

— Alex Chee, Termina

It can build competitive advantage.

“The advantages are manifold. We find opportunities early, evaluate them with a significantly more informed approach, and support them via software infrastructure — something that most other LPs simply don’t have the capability to do. Also, given we are active users and willing experimenters of the latest dev ops and ML Ops tooling, we have the added benefit of truly being in market and maintaining our edge. 

Also, we find there are access advantages by enabling our investments, in our case GPs, with intelligence and data applications to augment their own internal efforts.”

— Jake Kupperman & Albert Azout, Level 

It provides leverage.

“First, the process allows you to save time you can spend on more interesting and useful activities. Second, [it creates a] higher diversity and a less biased process, as different models look at different metrics and variables. Finally, it gives you a strong competitive advantage, because you get to deals earlier, better prepared, and with more to offer.” 

— Francesco Corea, Greycroft
“The primary advantage of a data-driven approach at Tribe Capital is the enhanced accuracy and depth in evaluating investment opportunities. Termina [our AI tool] is fast, allowing us to accurately extract, structure, analyze, and benchmark crucial information from varied sources like decks, financials, and transactional data in a fraction of the time a traditional analyst team would need. 

The result is a profound understanding of each investment opportunity and an eye-opening experience for founders who often learn something new about their own businesses within days of meeting us.”

— Jonathan Hsu, Tribe Capital

It may be especially helpful in early-stage venture.

“Data is not only applicable to early-stage VC — it’s even more necessary in early-stage investments. Failure rates in early-stage VC are huge because data at this stage is more qualitative, and humans have proven impossible (at least so far) to interpret it consistently.

VCs have no choice but to use it. Off-the-shelf solutions (Spectre / Pitchbook etc.) don’t provide enough of an edge, and serious investment is needed in building technical teams and data sets.”

— Damian Cristian, Koble
“This is the future of venture, and structural advantage for seed firms in 5-10 years.” 

— Lan Xuezhao and Wilson Kyi, Basis Set Ventures

The cons

It’s expensive, especially if you’re building internal capabilities. 

“It’s f@%!ing expensive.”

— Numerous respondents (We’re paraphrasing. Slightly.)
“For most funds operating on a 2% management fee, building technical teams and datasets is simply unaffordable. Hiring an intern data science student for 2 days a week is not going to cut it.”

— Damian Cristian, Koble

Building teams, datasets, and analytics tools in-house is time-consuming.

“The biggest disadvantage has been product development cycles — it takes a while to build, and sometimes longer to calibrate.”

— Kaushik Subramanian, EQT Ventures

“The disadvantage lies in the complexity and resource intensity of managing and interpreting vast amounts of data. While data-driven evaluation offers deeper insights, it requires sophisticated infrastructure and the talent to build it to extract meaningful information and actionable intelligence. 

Further, any benchmarking dataset worth anything requires significant time to accumulate. Based on our conversations with our friends across VC, this cold-start problem and prohibitive cost has caused the majority of data science efforts to fail.”

— Jonathan Hsu, Tribe Capital

“The biggest disadvantage is that you get rejected a lot. About half of founders never write back, and then about half of those who do write back indicate they don't want our money. 

Most VCs get rejected a lot by LPs but rarely put themselves into situations where they get rejected by potential portfolio companies. With the Quant Fund I've managed to do something where I got rejected by a lot of LPs to set myself up to get rejected by a lot of startups :)”

— Abe Othman, AngelList Early-Stage Quant Fund

“At the end of the day, even if your data-driven sourcing is 100% correct all the time, you still have to get into the deal and convince the founders to take your money.”

— Francesco Corea, Greycroft

“If you use a bunch of data points to discover and evaluate startups, and then bring these data points into an IC meeting that gets overrun with bullish investors around the table (who discard the data), then have you really become a data-driven fund? Or is it just marketing BS you sell to your LPs and founders? 

There are tons of funds doing this, you don’t have to look that hard to find them.”

— Damian Cristian, Koble

“If used properly, we don’t believe there are many disadvantages; however, as is the case with human decision-making, we need to be aware of biases in the data. In addition, we need to make sure our models are not ‘black-boxed’ but have advanced explainability, trust, and confidence bounds built in.”

— Jake Kupperman & Albert Azout, Level

“At the earliest stages, there isn’t a lot of data to go off of. Even if we have data about the market, competitive landscape, and early traction, startups pivot ~30% of the time after we write an initial check. 

Additionally, data may tell us to focus more on traditional startup hubs for talent/ideas but then overlook emerging markets, both domestically and abroad.”

— Haley Bryant, Hustle Fund

“While we firmly believe in the advantages of data-driven approaches, we acknowledge that venture investing fundamentally relies on relationships and partnerships. 

Successful investing and company building often lie in the balance between qualitative insights and quantitative analysis. Data informs decisions, but of course, the human element remains irreplaceable in most early-stage venture investing.”

— Jamesin Seidel, Chapter One

“There are a lot of qualitative signals that data can't pick up on, such as someone's drive, motivation behind starting a company, ambition, potential, and salesmanship, which are also really important in our investment decisions. 

So we don't make decisions solely based on these quantitative signals and like to get to know the person behind the company, regardless of where they come from.”

— Andrea Wang, General Catalyst

Watch this space

At Weekend Fund, we're continuing to watch the evolution of the space closely.

In terms of the impact of data-driven VC, we’ll leave you with these parting words from Damian Cristian (Koble):

“The benefits for startups, investors, and society are obvious: capital is routed to the people and ideas that deserve it most; investors get better risk-adjusted returns; society moves forward.”

Thank you for reading!

Until next time, 
Ryan
and Vedika. Special thanks to Shân for writing this piece with us.

read Next

Venture studios & incubations

$100B+ of enterprise value created, 800+ venture studios worldwide. Hear from the operators of the top venture studios.