Visibility and Optimization for Cloud Data

While the shift towards data in the cloud has created a massive opportunity for companies like Snowflake and Databricks to store and analyze data, we hear consistent concerns from organizations around ballooning costs of compute as the number of queries and data volume grows exponentially across an organization.

As organizations continue to spend millions of dollars a year to leverage data for decision making and future innovation they still have limited visibility into workload usage and costs. And there are few levers to optimize utilization of their cloud data warehouse.

We believe there’s an opportunity for new companies to provide organizations with better visibility of their workloads while also optimizing their queries for cost savings and speed. Optimizing recurring or poorly formulated queries can not only result in significant savings for companies, it can also improve query performance and overall team productivity.

That’s why we’re excited to share our seed investment in Bluesky, the intelligent workload optimization and cost governance company for cloud data. Bluesky is aimed at helping cloud data warehouse users save millions of dollars by examining patterns in workloads across the organization and encouraging optimal usage.

Bluesky’s product, which officially launches today, analyzes historical query patterns to detect similar or duplicative groupings. Bluesky can use these groups to suggest high-impact tuning options for valuable workloads and kill long-running queries that don’t provide much value. Companies like Coinbase are early users of Bluesky, and are already seeing value from the technology.

Bluesky co-founders Mingsheng Hong (CEO) and Zheng Shao (CTO) have been longtime friends and colleagues. They’ve known each other since their time as graduate students, and both went on to have strong careers in Silicon Valley.

Mingsheng has seen several generations of data technologies and companies at Vertica, Hadapt, and most recently at Google. At Google, Mingsheng built the next generation storage and querying stack, which powered Google’s $100 billion ads business and beyond. He also built a new TensorFlow backend that took Google’s AI workloads to the next level in terms of performance and efficiency. Both technologies are still being used today at Google.

Zheng pursued a similar path in startups and technology. He’s had a hand in building the data backbones for some of the largest and most demanding cloud scale companies in history: Facebook, Dropbox, and Uber. Zheng is also a long time contributor to open source data products as a PMC member for Hadoop and Hive.

We’ve known Zheng for over seven years, when we first met him in our Greylock Big Data Community. Jerry reconnected with him and Mingsheng through friends and founders of Rockset and Chronosphere when they were contemplating leaving their jobs to start a new company. Known as “the experts” in data, we are thrilled to be in business with Mingsheng and Zheng, and can’t wait to see the impact their product has on organizations’ cost savings and efficiency.

If you’re passionate about big data and machine learning, and interested in joining the team, Bluesky is hiring! More on open opportunities, and what it’s like working at the company here.