5 min Analytics

ClickHouse, the open-source challenger to Snowflake and Databricks

ClickHouse, the open-source challenger to Snowflake and Databricks

At the end of last week, ClickHouse made two major announcements. It has now reached a valuation of $15 billion after a new investment round and has acquired Langfuse, which provides observability for LLMs. What else does the platform have to offer, and how does it hope to grow?

As it happens, we’ve rarely written about ClickHouse. Last year, we only mentioned the solution in connection with a configuration error at DeepSeek and its decisive role in the recent major Cloudflare outage. Now we want to dive deeper into the solution than that, as it is being widely adopted and is becoming ever more relevant. Although ClickHouse has been running as an open-source product since 2016 and has been operating as a separate company since 2021, it is relatively underexposed.

The AI speed demon

The roots of ClickHouse go back even further. Conceived within the Russian company Yandex in 2009, it has already undergone a long development. In 2026, the rise of AI, and the popularity of LLMs in particular, appears to be a catalyst for the growth of ClickHouse. Its customer base includes AI players Microsoft, Meta, DeepSeek, Anthropic, and Cursor, as well as companies as diverse as eBay, Spotify, Lyft, HubSpot, and Instacart.

The explanation for this widespread adoption lies in the fact that ClickHouse is, above all, extremely fast. The solution uses a columnar OLAP (Online Analytical Processing) database. This type of database is very suitable for parallelization, or the simultaneous processing of different calculations. It does this by splitting queries or distributing multiple queries across nodes. So-called Vectorized Query Execution processes data in blocks (batches) instead of row by row, making optimal use of modern CPU architectures. This allows multiple data points to be executed in a single instruction (Single Instruction Multiple Data, or SIMD for short). As a result, ClickHouse performs strongly during complex processes, as is often the case with AI workloads. However, in terms of architecture, it is not designed for modifying individual rows within data.

One factor in this performance is data compression. The less disk space, the fewer I/O operations for the IT infrastructure, which again results in better performance. This is where the comparison with a popular data platform such as Snowflake proves useful. Whereas Snowflake can reduce a CSV or JSON file to roughly four times its size, ClickHouse can reduce the data to one-twelfth or one-twentieth of the uncompressed original.

This advantage is anything but free. Snowflake and Databricks are data platforms that largely take the burden off the user. With Snowflake, storage and compute are also completely separate, which provides flexibility in terms of scalability but also introduces latency. Integration with specific hardware can be more in-depth with ClickHouse, but it does require more engineering expertise to take advantage of the speed. Manual work is required to make data compression as effective as possible. The fact that Snowflake makes files less small also results in a higher price for the end user. Thus, despite the extra effort, adopting ClickHouse may be worth the money.

From specialist to generalist

Unlike large data platforms, ClickHouse is a specialized tool for the highest AI performance. AI model creator Claude Anthropic points out that the construction of Claude 4 would not have been possible without the in-depth real-time insights provided by ClickHouse. The acquired Langfuse was already using ClickHouse itself, according to its own statements, without a partnership as a motive. That party switched from Postgres, which showed its limitations beyond millions of lines within the database. Slow observability is a death blow for large parties running LLMs in production, so ClickHouse’s niche is quickly found.

ClickHouse’s role seems to be becoming increasingly central thanks to the growing importance of AI. However, Snowflake and Databricks know that they can improve their performance just as well, with two major advantages: customers are already accustomed to their solutions, and they don’t lose sight of user-friendliness. ClickHouse hopes to move in their direction before their major rivals take that step toward improvement. It embraces the cloud, remains open-source under the generous Apache 2.0 license, and offers commercial advantages with ClickHouse Cloud. In the cloud, ClickHouse can scale up and down as desired and includes an accessible dashboard.

ClickHouse does not yet have the features of Snowflake and Databricks to serve as the central platform for business data. Reporting from all data within a company or centrally running AI agents is still too general for ClickHouse at this time. It is also simply not worth improving the performance of rarely running or not overly critical workloads. Most data processing does not (yet) need to be lightning fast. The more data that does need to be processed quickly, the more important the advantages of ClickHouse become.

Read also: Snowflake makes AI mature with a snap of the fingers