If you haven’t subscribed yet, you can get thoughts and musings about personal finance and whatever else I find interesting straight to your inbox by clicking here:
Data has been the common theme throughout my career. I’ve spent the vast majority of the past decade working for various data infrastructure vendors, and data has always been core to my personal thesis on where I wanted to spend my career, and invest my dollars in the private markets. I took the leap to join dbt Labs née Fishtown Analytics a couple months ago, and have only developed more and more conviction in the belief that dbt is going to drive huge value for the enterprise, and represents the future of data integration.
If you’ve read anything else I’ve written, it probably goes without saying that I do view my career choices as investments — time, energy, and money — and I believe that choosing to go work for a company is an indication of the highest possible conviction in the success of that business. My conviction on dbt Labs is no different.
My thesis on dbt
To really understand why I’m making a big bet on dbt, I need to talk about some of the things I’ve learned over my career. I’ve spent nearly all of my career in the data infrastructure space, and watched the analytics and data warehousing industry cycle from columnar databases to the Hadoop zoo and back to a cloud data warehousing (CDW) world, and feel exceptionally strongly about a number of things:
Simpler, and easier to use, is better: This is a highly-intuitive statement, but I think it explains a lot of why the data management industry has cycled through technology stacks so rapidly. Two significant secular trends coincided (the world moving online, and the rise of cloud infrastructure), which resulted in an explosion of data, and a mass outsourcing of infrastructure. The result was fast-moving, major changes to what data infrastructure looked like from analytical databases to big data systems to cloud data warehouses. Hadoop represented a significant, but relatively short-lived trend, because it solved the same use cases as traditional warehouses, but with much greater complexity and cost, and cloud data warehouses looked and felt like the databases that the world was comfortable with, but offered the solutions in a much simpler to use, less expensive (to start, at least) package.
Community is rocket fuel for a business: I might be very biased here, because I’ve spent the better part of my career working for open-source and open-core companies, but it’s not hard to make a logical connection between companies like Databricks and Confluent to see the potential impact that a fast-growing community can have on a business. On a personal level, I’ve had my share of deals sourced off a user mailing list or community Slack workspace, so it’s been directly impactful for me.
I was introduced to dbt in 2019, and admittedly, it took some time for me to connect the dots to understand just how important dbt was going to be. From a technology perspective, I think there were a handful of realizations that I had that really cemented it for me:
The rise of Cloud Data Warehouses (CDWs) refocused the data world on SQL as a core language: Python has become super popular in the data science and data engineering worlds, but even looking back at Hadoop’s heyday, much of the transformation work was being done via HiveQL, which is just a SQL derivative. SQL is the native language of databases, and thus a lingua franca for analytics. SQL is widely understood, so the familiarity of it gives it extreme utility.
Orientation around SQL means you can actually democratize analytics work: A huge pain point for data consumers is that they’re not enabled to serve themselves, and are hamstrung by the ticket queues for oversubscribed data engineering teams. dbt opens the door to allow a broader constituency of SQL-literate users to build their own data pipelines. SQL is also a standard, whereas many legacy ETL tools were oriented around low-code/no-code approaches where nuances across tooling meant that you had to relearn idiomatic approaches with every tool change.
Adoption of DataOps, meaning DevOps principles applied to data management, produces better outcomes: For a lot of data engineering teams, the idea of truly democratizing analytics probably produces some heartburn, because there’s a lot that can go wrong. I’ve come to believe that strong DataOps principles are the right way to reduce and manage that risk. However, I also think that the right way to apply those best practices is by integrating with enterprise-wide infrastructure like GitHub for source control or CircleCI for testing/integration, and that defining solution-specific analogues amounts to an anti-pattern.
At its core, dbt is a tool that enables anybody who knows SQL to build data pipelines. It’s deceptively simple and elegant in its approach, but it solves very real, and very painful business problems. Critically, it’s intuitive, and easy to use, yet powerful. It makes 80% of analytics problems easy to solve, and the remaining 20% possible to solve.
There’s also been social proof that has reinforced my conviction. A good example of this showed up in a blog post from Bessemer Venture Partners recently, who offered this diagram of the data infrastructure landscape:
What I think is really powerful about this image is that most of the categories have a whole host of entrants. This isn’t the only example you can find like this, and if you google “modern data stack,” it’s pretty trivial to come up with similar diagrams that offer a lot of options in every category, except for the Transformation category. There, you’ll generally only find dbt (LookML is a somewhat interesting inclusion in this diagram, in that it’s a component that you only have access to if you also use Looker for BI/visualization).
Now, one might suggest that having no clear competition is often indicative of a non-existent market. However, transformation is a) not a new category, and b) not devoid of competitors, but in the current data infrastructure marketplace. For example, consider Matt Turck’s market map for data. 2020’s version of this (the Fishtown Analytics logo represents dbt here, though dbt does show up in other spots on Turck’s visualization).
Clearly there is a broader set of competitors in this space, but the reality is that there just aren’t many other serious innovators in the space today. The ones that do exist have tended to focus on data ingestion, while leaving data transformation aside (such as Fivetran, Airbyte, and Meltano). Most of the technologies in the image above are 7+ years old already, and it’s easier to find companies moving off of many of them than to them.
Transformation is mission-critical to the data supply chain
The mindshare that dbt has today is extraordinarily valuable for dbt Labs, the business, because data transformation is a mission-critical, and highly-central part of the data supply chain. There’s a reason that the original wave of ETL tools like Informatica and Talend were able to expand broadly into a wide product portfolio. In the data integration world, transformation is the most important capability, and from that spot on the board, everything else is an adjacency.
A remaining question is: how big is the opportunity? Informatica went private several years ago at a $4B valuation, Talend more recently at a $2.5B valuation, and Alteryx sits at about a $6B valuation currently. While it’s possible that dbt will capture a portion of these markets, the reality seems to be that the CDW ecosystem is generating large, greenfield projects that legacy tooling is never even considered for. I believe that the correct way to estimate this is relative to market size of the underlying data management systems. It’s hard to know exactly what the CDW market size is, since Redshift and BigQuery roll into AWS and GCP overall P&Ls, but Snowflake currently holds a market cap of $70B, Databricks clocks in at $28B on the private markets (and it’s probably not insane to expect that they’d fetch a $40B cap on the public markets), and nichier up-and-comers like Starburst and Dremio also have billion-dollar valuations to add to that.
Data management is a huge space, with many peripheral markets building off of it, including the data integration market. Oftentimes, the way players in these markets price their products is as a percentage of customer spend for the primary market (for example, data integration tools aim to capture a fraction of the spend on data management infrastructure). In the case of data integration, that fraction is often around 25% (this is somewhat anecdotal based on my experience working for several data integration companies). If we extrapolate out the market size of data integration for the cloud data warehousing market, we’ll land at a number somewhere in the neighborhood of $25-$30B. It’s hard to imagine dbt capturing the entirety of the data integration market surrounding CDWs, but with a market of that scale that is continuing to grow rapidly, even at $1.5B in the latest funding round, dbt Labs stock still looks cheap.
Community-led growth, in its purest form
The other major social proof factor for me is the open-source and community adoption curve. I’m a huge believer in the open-source and open-core model, and it’s been another common theme across my career. I believe that, when used appropriately, open-source can be a hugely powerful driver of distribution and adoption of a technology, and monetized carefully, it creates immense business value. The first time I spoke to the dbt team, the dbt Community Slack was already bustling, with somewhere north of 4,000 community members. I periodically popped in to see how it was growing, and could observe an eye-popping growth curve. By the time I actually joined dbt Labs, about a year after those initial conversations, the community membership was closer to 13,000 members. The image below isn’t the curve of the Community Slack membership, but the shape of that curve is nearly identical.
What still boggles my mind is that the community comprises 15,000 or so people today, and it is active. I hang out in a number of highly-populated Slack communities with similar or larger numbers of members that are just ghost towns. There’s limited discussion, save for periodic AMAs. The dbt community is vibrant, with rich micro-communities, and people at all skill levels, and it’s a testament to the hard work of the dbt Labs Community team, but I think it really speaks to the core community-orientation that the entire business exudes. It’s powerful, drives engagement, and product love, and it’s why the list of reactions to our funding announcement on the Community Slack looks like this:
Company values that align with my own
Past the technology and these social proof points, the actual company, dbt Labs, has been built with care and has values that align well with my own. Corporate values often feel a lot like lip service being paid to attempt to generate warm fuzzies, after a couple months here, I’ve come to feel that dbt Labs is unusually aligned around its ideals.
Diversity at the core: A lot of companies aim for diversity, but many are lazy about it, or half-hearted. It takes hard work, and intentional hiring to build a diverse organization, and dbt Labs does it way better than nearly any other tech company I’ve seen. Our full, company-maintained org chart is publicly available, and to make the point, I went through to see what our actual diversity ratios look like along a few key subgroups within the organization. I don’t have data on who identifies as BIPOC or LGBTQ+, so it’s a little harder for me to point to those numbers, but our female-to-male ratios look like the following:
Overall company: 48% female
Management (anybody who has direct reports): 50% female
Engineering staff: 32% female
Board of Directors: 20% female
While there’s always progress to be made, and diversity and inclusion work is never finished, I’m proud of where the company is, and that diversity continues to be a high-priority ideal as we build the team.
A company of practitioners / hyper-focused on the end user: The company is full of dbt users, in surprising places. This isn’t universally true, and I definitely came into the organization on the less-dbt-experienced end of the spectrum, but you’ll find dbt experts in every function from sales to marketing to our community team. dbt Labs also has one of the most sophisticated internal data/analytics organizations that I’ve seen at a B2B company at this stage. It was always a running joke in the early days at big data companies that we were building and selling software that we couldn’t really get value from, ourselves. The story is much different at dbt Labs. Our heavy internal use of dbt produces a rather unique culture of dogfooding and customer empathy, which I believe helps reinforce the company mission.
Welcoming of debate: I’m a staunch believer that progress comes through creating productive tension, and that if differing opinions aren’t allowed, it’s extremely hard to make progress. This can be challenging at an organization like dbt, which operates with remote work as a first principle, and the company is able to work around that by focusing on written communication, but creating space to question everything and bring new opinions to the table. All of the founders have open office hours on a regular basis that gives space to ask hard questions and get honest answers. Personally, I really value the willingness to engage and discuss hard topics.
A bright future ahead for analytics engineers
I am personally so excited to be a part of the dbt Labs mission going forward. The opportunity in front of us is simultaneously to be able to capture significant marketshare in a data integration market that is rapidly ousting incumbents, but also to reinvent the data integration category entirely. Analytics engineering, as a discipline, didn’t really exist five years ago. The work was certainly being done, but muddled across a number of different roles throughout data organizations. Breaking the responsibilities of an analytics engineer into a dedicated role provides clarity, and with all new roles, there is a need for new tooling and platforms. I’m confident that dbt is that new platform.