Faster, Easier, and Cheaper Data Lakes

Update on 02.02.2023
After announcing our Seed investment in Onehouse exactly one year ago today, we are excited to share that we’ve doubled down on our partnership with the company and have co-lead the Series A with our friends at Addition.

Over the past year, the interest in data lakehouses has exploded. In particular, enterprises want to build a data lakehouse that is open and interoperable, while requiring minimal engineering effort on their end. Onehouse is providing the solution to both these problems.

Beyond leveraging Apache Hudi, an open source and critical component of the data lakehouse to store and manage data, the company launched Onetable, a feature that enables interoperability between metadata formats including Apache Iceberg and Delta Lake. With Onetable, customers can take advantage of the scale, performance, and cost advantages of Apache Hudi while also using features and products provided by Databricks and Snowflake.

While it is still the early days, we’re excited to watch as enterprises use Onehouse to get data lakes up and running with just a few clicks that work with any major query engine.

–

The best businesses know how to use data to their advantage, leveraging it as a driving force for innovation and decision making. Yet as companies expand and data volumes grow, they often struggle with their data architecture. Data warehouses – which are great to process and transform structured data for advanced querying and analytics – are costly and can’t scale. And data lakes – which are great for large volumes of mixed or unstructured data – can be difficult to manage and sort through unless you’re a skilled data engineer or data scientist.

Vinoth Chandar faced this problem first hand while working as a software engineer at Uber. He needed the performance of a warehouse and the scale of a data lake in real-time. So Vinoth created Apache Hudi to implement a new architecture, where the core warehouse and database functionality was directly added to the data lake. He was a pioneer in the technology that today is known as the “lakehouse.”

The lakehouse architecture is a gamechanger for enterprises for several reasons. It decreases administration time and effort compared to maintaining both a data warehouse and a data lake. It is also a single source for workloads across data science, machine learning, and SQL and analytics, meaning there’s less unnecessary data movement and redundancy. A lakehouse also gives you direct access to data, reducing staleness and latency. And finally, it’s a far more cost effective way to store and process data.

Given this, it’s no surprise that Apache Hudi, which was open sourced in 2017, has seen incredible success with startups and large enterprises alike. Thousands of organizations across the world – from Amazon, Disney+ Hotstar, Robinhood, and TikTok – have contributed to the Apache Hudi community and project. The open source project has grown to nearly 1 million monthly downloads, and at Uber, Hudi continues to ingest more than 500 billion records every day.

I am excited to share that we have co-led the seed investment in Onehouse with our friends at Addition. Onehouse leverages the unique capabilities of Apache Hudi, to offer a cloud-native managed lakehouse service. Onehouse makes data lakes easier, faster and cheaper. Instead of creating yet another vertically integrated data and query stack, it provides one interoperable and truly open data layer that accelerates workloads across all popular data lake query engines like Apache Spark, Trino, Presto and even cloud warehouses as external tables.

I first met Vinoth when Onehouse was just an idea, and helped him iterate on what it could look like to build on top of the success of Apache Hudi. I was immediately impressed with Vinoth’s thinking around delivering the new data architecture on top of the open source success. And the more I got to know him, I was struck by both his technical leadership and his understanding of customer data problems, and how his experience at Uber could translate into the entire market.

As I mention in my analysis on Open Source vs. Cloud Castles, we are increasingly seeing open source startups built in (and for) a cloud-native ecosystem. And as a result, these new startups are making a market impact far quicker than those from earlier generations, because they are combining the low friction distribution of a cloud service with the open nature of open source communities to reach developers. Onehouse is the latest startup doing just that, and I’m thrilled to be part of their journey.

WRITTEN BY

Jerry Chen

Jerry searches for ambitious founders who are redefining enterprise software.

Faster, Easier, and Cheaper Data Lakes

Update on 02.02.2023 After announcing our Seed investment in Onehouse exactly one year ago today, we are excited to share that we’ve doubled down on our partnership with the company and have co-lead the Series A with our friends at Addition.

WRITTEN BY

Jerry Chen

Subscribe to the Greylock newsletter

Update on 02.02.2023
After announcing our Seed investment in Onehouse exactly one year ago today, we are excited to share that we’ve doubled down on our partnership with the company and have co-lead the Series A with our friends at Addition.