I have spent the last few years investing in Vertical Data Companies. Driving this is a belief that there will be a set of companies, built on a core set of insights derived from data, that will each come to dominate their respective industry. I wanted to take some time to explain in more detail and provide the reasoning supporting this thesis.

When I talk about data companies, I am talking about companies built around a core data pipeline. These companies typically fit the following pattern:

A team that has figured out a novel way to collect and analyze data, leverage cheap compute resources and machine learning, and derive insights that can be applied to solve an existing problem.

That entire data pipeline is important to me. Capturing the data without the ability to derive insights and solve problems, often doesn’t capture enough value in the system. Deriving insights from freely available data can be valuable but often lacks a competitive barrier and will attract fast followers.

By vertical, I mean companies that are focused on a particular industry or sector. A focus on vertical companies is nothing new in venture capital — large enterprise SaaS investors have seen big wins in this space in the likes of Veeva and Guidewire. Some of the benefits include higher market share, less competitive pressure, concentrated customers, lower CAC, and a lower cost to build a meaningful brand. My deviation from this thesis is that I think that there are additional ways to deliver that insight beyond the seats & tiers SaaS model. Software is always a component of the solution but I’m also happy to support companies that are monetizing their core insight via a broker or marketplace business model. The industry, end customer, and the expertise of the founders drive the product and business model.

At the macro level, my view is influenced by Carlota Perez’s Technological Revolutions and Financial Capital, which posits that the past technological revolutions have followed similar ~60–75 years cycles. These cycles are summarized in the graphic below:

These dynamics create opportunities for companies that can leverage technology at the forefront of the ICT curve including data storage and processing, machine learning, open source software, connected hardware, and mobile. As compute power and software become commoditized, companies are increasingly relying on proprietary data and distribution to build lasting competitive advantages.

Of the two, building a competitive moat via proprietary data is often the easier and more capital efficient route for early stages. Distribution is still crucial but it is really hard and/or expensive to stand up a competitive barrier through distribution alone during the early days. I prefer the route of proprietary data > large market share in a vertical > established distribution channels for follow on products.

What this means for founders

If you think that you are building a company that fits this mold, I would love to connect. To get a jump start, here are the major questions that I always ask of potential investments:

What data are you collecting? How is it unique? Is it proprietary?

What insights have you drawn out from the data? Tell me more about how the technical team is using AI/ML technologies. Tell me more about the business / product team that is directing the focus and asking the right questions.

Tell me about the problem. Why is it important to solve? How big of an opportunity is it?

How well do you know your end customer / user? How does your knowledge of the industry, the problem, and the prevailing workflows inform your product decisions? How will you monetize that insight?