About three and a half years ago, I wrote an article about my career investment thesis on dbt Labs, and now nearly four years after joining, I find myself at the end of one journey and the beginning of a new adventure. It’s been an opportunity for reflection for myself, on why I joined, on what I learned, on how we evolved as an organization. Even as I move on from dbt Labs, I find myself still convicted on the product and the opportunity for dbt Labs.
The rise of the multi-platform enterprise
The most widely played marketing messages from Snowflake and Databricks these days center around AI, but if there was any one topic that rang out even louder at their annual conferences this past year, it was the table format war that had been going on for some time between Iceberg, Delta Lake, and Hudi. Databricks reportedly spent $2B to buy a small startup called Tabular that was productizing Iceberg for the enterprise.
Why? The answer is portability. Iceberg solves multiple problems all at once. For Snowflake customers, it’s a major cost improvement. Instead of storing raw datasets on S3, and then getting double charged for data storage after loading data into Snowflake, customers are able to query on top of Iceberg tables directly. Effectively, this reduces storage costs, without loss of efficacy at the query/BI layer. For both Snowflake and Databricks customers, this is a portability enhancer. Databricks customers can have flexibility by moving away from Delta Lake, which is open, but very Databricks platform-specific, and it enables both Databricks and Snowflake to operate as query layers on top of the Iceberg storage substrate, rather than having a fully integrated, and tightly coupled, stack.
For dbt, this has a couple major knock-on effects:
Companies will be much more inclined to use the right tool for the right job: have a use case that demands Snowflake? Query from Iceberg tables. Have a use case that demands Databricks? Query from your Iceberg lakehouse. Have a use case that demands streaming? Flink on top of Iceberg becomes an option.
Companies will be able to use the tool that is the cheapest for the job: Different processing systems will have different costs based on use cases, processing patterns, etc. By leaning into technologies that can work cross-platform, the business gains optionality and cost efficiencies by leveraging the lowest cost platform for the task at hand. You can imagine a future where you have a system that evaluates the cost of a job at runtime, and chooses the right platform based on current system load, etc.
In both of these scenarios, it’s absolutely critical to be bought into a platform that is agnostic to the underlying processing systems, and that’s where dbt Labs sits today. The recent SDF Labs acquisition provides some good insight into future strategy – SDF, at its core, is largely compiler technology that provides very sophisticated transpilation capabilities. It enables a dbt project to seamlessly move from one data platform to another, without requiring code rewrites or complex Jinja work (i.e. use of cross-database macros) to ensure that a model can execute in a cross-platform way. This is efficiency-enabling technology – it creates the pathway for dbt Labs to provide the ability for businesses to realize these benefits.
AI is not going away – governance and data quality are paramount
I worked for Cloudera over a decade ago, and one of the tag lines we were spouting in the early 2010s was “data is the new oil.” While the big data craze has meaningfully come and gone, as we enter a new era of AI, data is more than just oil to be drilled. It’s the whole damn moat. Increasingly, GenAI is creating a dynamic where massive companies can rise and fall overnight, and with foundation models proliferating at breakneck pace, the things that are meaningfully creating differentiation for this new crop of startups are the datasets themselves.
In order to effectively engineer an AI application, depending on whether you’re doing instruction tuning, fine-tuning, or leaning on RAG to produce a high quality experience, you need a well-curated dataset to drive the training and inference processes. Ensuring that data is consistently and accurately labeled, that it’s governed effectively, tooling like dbt ends up being a must-have in the data refinement process to drive data quality initiatives. It helps that products like Snowflake Cortex make all of this accessible directly through a SQL interface that dbt excels at integrating with.
Thinking about the potential boogeymen hiding in the data pipelines, the other thing that AI brings with it is rampant PII (and PHI and PCI and all of the other PXI) challenges. This is another arena where SDF will bring some interesting new capabilities to the fold. Specifically, SDF provides classifier primitives that will enable dbt to meaningfully trace lineage of PII through layers of transformation and data refinement. Being able to identify where PII lives is going to be critical for enterprises to maintain compliance, especially as regulation surrounding GenAI is inevitably ramped up over time. dbt is, again, laying the groundwork to be an impact player in the rapidly evolving world.
dbt has has staying power within the transformation market
Four years ago, when I joined, dbt had less than $5M in ARR. The company recently announced it had crossed the $100M milestone. While this isn’t the growth curve we’re seeing with some of the new AI upstarts, this remains an extremely fast approach to $100M, and it comes with a 5000-strong customer list. dbt clearly struck a chord with the data world, and has rapidly become a standard for data transformation across the industry. This can be seen by the sheer number of dbt execution platforms that exist these days. There are a myriad of companies that tout their ability to execute dbt code or have compatibility with dbt, not to mention companies like Databricks offering those capabilities integrated tightly with their existing data warehousing product. While this does produce an inherent level of competition, dbt Labs is accustomed to that due to the nature of the open-source model, and continues to be (certainly in my mind) the best place to go for a dbt-centric development platform.
One thing that didn’t meaningfully exist when I started at dbt Labs was competition. There were very few natural competitors beyond dbt Core, itself, and I called out that very apparent fact through the proxy of industry market maps. This is less true today, and I think that that underscores the standardization around dbt. Small upstarts like Paradime and Tobiko Data take different approaches to tackling dbt Labs as competition – Paradime is largely a runner of dbt Code, and Tobiko Data offers SQLMesh, which is a meaningfully different framework that takes a great deal of inspiration from dbt, and aims to provide zero-code-changes-required execution of dbt projects. In fact, Tobiko’s entire approach to the market seems to be to mirror every move that dbt Labs makes (as evidenced, for example, by their recent seemingly-shotgun-wedding of a Quarry acquihire, which I admittedly think is a more foolish approach than focusing on relentlessly solving customer problems). Larger startups like Coalesce.io have taken a more differentiated tack, but still display significant inspiration from dbt’s roots. Imitation is the sincerest form of flattery.
Looking at larger scale competitors, the inevitable markets to look at are Informatica and Alteryx. Alteryx was taken private in a $4.4B deal, which can only be taken to indicate that they’ve had a challenging time shifting their energy away from their historical desktop app-based interface, and likely indicates that their cloud strategy (predominantly the acquisition of Trifacta) has fallen flat. Similarly, Informatica is on a years-long journey to migrate their perpetual license-based revenue from PowerCenter to subscription-based recurring revenue with their Informatica Intelligence Cloud Services product, and while they have the benefit of a massive amount of revenue, their stock plummeted 25% last month when their cloud renewals fell short of Street expectations. These are meaningful chinks in the armor of historical giants, and $10B+ of TAM at play – from my vantage point, this is strictly opportunity for dbt Labs.
Lastly on the topic, it’s important to look at the partnership angle. dbt Labs has historically had a very strong attachment to Snowflake and other platforms (Redshift, BigQuery, and the like) that look more like traditional data warehouses than newer-age lakehouses. This, too, is an opportunity. The implication is that Databricks, which is seeing blazingly fast adoption, is a rising tide that will inevitably lift dbt Labs’ boat. Databricks’ CRO, Ron Gabrisko, was a recent attendee and speaker at the dbt Labs Sales Kick-off, and it’s reasonable to expect an energetic partner motion headed forward. This happens to coincide with an interesting little announcement from the Databricks SKO that they had acquired Bladebridge, which is a technology-backed consultancy that focuses on platform migration projects. This mirrors Snowflake’s acquisition of Mobilize.net (who developed a tool called SnowConvert) from a couple years ago. However, where Snowflake’s primary target was very clearly Teradata, Databricks’ target is aimed directly at the jugular of Snowflake. The fallout of the Snowflake and Databricks war all benefits platforms that aim towards agnosticism, and I feel strongly that dbt Labs will be a beneficiary.
“It’s the people, stupid”
The last major tailwind I see for dbt Labs today is the people. dbt Labs has long employed some of the smartest, most talented people I’ve had the pleasure to work alongside in my career. However, in recent days, the theme has clearly been on upleveling the executive ranks. As an aside, I’m married to a highly-talented woman (the far more impressive half of the couple) who recruits for CROs and a variety of GTM leaders for venture-backed startups. One of the things I’ve learned seeing her practice her craft is the importance of matching the executive to the growth phase. There are various inflection points for companies as revenue scales, and as dbt Labs is existing the $0-100M phase, and entering the $100-500M phase, the company is staffing up with people have seen the proverbial movie before, and will be able to help the company navigate the next several years. For example:
Sarah Riley (CFO) helped Okta and Zoom chart their paths through IPO
Brandon Sweeney (COO) was at the helm of HashiCorp from their $100-$500M IPO run
Austin Stefani (CRO) led Rubrik’s Americas org to a successful IPO outcome
Ryan Segar (CCO) previously led Sisense’s technical field organization through their rise
Mark Porter (CTO) was GM for several major AWS services, and most recently CTO of MongoDB
Sally Jenkins (CMO) is in her fourth stint as CMO, having previously done the job at Informatica and Elastic (among others)
This is an absolutely stacked set of executives (even ignoring the extremely talented set of people operating at VP and other executive levels), and is exactly the type of group that I would hope to see at an IPO hopeful.
Acknowledging my biases
This all stated, I still have immense bias here. I have a lot of shares sitting in Carta, and desperately want to realize a large amount of value from those down the road. The inevitable question is: can dbt Labs go the distance?
It’s impossible for me to be sure of the outcome, but I feel confident that dbt Labs is on the path to success. And what I can definitively say is that I’m incredibly proud of what the team at dbt Labs has built, and so grateful to have been a part of the story.
So why did I leave? Stay tuned for the answer to that one.
Nice article! Worth noting outside of sheer product adoption the biggest competition is probably people realising they want to run dbt-core in an orchestrator.
In terms of the market for IDEs, competitors are few and far between but there seem to be more and more cropping up every day. Curious to get your take!
Cheers
Good on ya Natty! Congrats on whatever your next adventure is.