OpenObserve is challenging the observability market with what it labels as an AI-native open source platform. In practical terms, the company sets out to deliver dramatic performance improvements and cost reductions compared to established players like New Relic, DataDog, Elasticsearch, Splunk, and Grafana.
Prabhat Sharma, founder and CEO of OpenObserve, explains in conversation with Techzine how the four-year-old company processes over 2.5 petabytes of data per day for their largest customer. At the same time, the solution is able to reduce infrastructure requirements by 80 percent and storage costs by up to 140x compared to legacy solutions.
The OpenObserve platform unifies logs, metrics, traces, real user monitoring, and LLM observability in a single solution that leverages the Rust programming language and Parquet data format to achieve superior performance on significantly less hardware.
Unified observability with dramatic cost savings
OpenObserve seeks to address a fundamental problem in the observability space. Traditional tools have become prohibitively expensive as data volumes have exploded with cloud adoption. Sharma notes that customers migrating from a five-node Elasticsearch cluster can achieve similar search performance with a single OpenObserve node while gaining 10x better analytics performance.
The cost advantages stem from multiple factors. OpenObserve’s use of Parquet as a storage format and columnar storage architecture enables compression rates that reduce storage costs by 10x to 100x. In one test comparing identical data volumes, OpenObserve’s storage cost was 140 times lower than Elasticsearch.
For startups and smaller companies, OpenObserve offers enterprise features free for any customer ingesting less than 200GB of data per day. Sharma explains this represents approximately $60,000 per year in equivalent DataDog value, making advanced observability accessible without requiring companies to “sell your home for observability.”
Pushing hyperscaler limits with petabyte-scale deployments
OpenObserve’s largest customer deployment revealed previously undocumented limitations in Google Cloud Platform. When ingesting and analyzing petabytes of data daily, the team discovered that Google Cloud Storage has a hard limit of one petabyte per day read from a single bucket per project. This limit doesn’t appear in any documentation and only became apparent when the customer hit it. The solution required creating an additional project and bucket to distribute the load, though this added complexity around authentication and access management.
Still, public cloud users may wish to find a solution without leaving their desired cloud of choice. The OpenObserve platform thus runs on all three major hyperscalers (AWS, GCP, and Azure) as well as on-premises infrastructure. OpenObserve works via OpenTelemetry for logs, metrics, and traces, but also supports alternative mechanisms like Telegraph and Syslog to accommodate systems where OpenTelemetry adoption is still maturing.
Modern technology stack enables extreme performance
OpenObserve’s performance advantages stem from architectural decisions made possible by starting in 2022 rather than 2010. Sharma explains they had the advantage of learning from the limitations of older systems while leveraging technologies that didn’t exist when legacy platforms were designed. It’s a typical case of the Innovator’s Dilemma, or rather lack thereof, when it came to OpenObserve’s creation.
The core platform is built in Rust, a programming language known for its comparatively extreme performance and memory safety. Combined with Parquet as the storage format, an open columnar file format optimized for analytics, OpenObserve achieves search performance comparable to Elasticsearch on one-fifth the infrastructure while delivering 10x better analytics performance.
This architectural approach also addresses the agent footprint problem. Sharma describes conversations with users whose DataDog agents consume 20GB of RAM on 120GB servers. That’s a pretty massive overhead just for observability tooling. OpenObserve’s philosophy emphasizes minimal footprint on production servers, processing data centrally rather than burdening application servers with observability workloads.
AI-powered SRE agent and LLM observability
OpenObserve is developing capabilities in two AI-related areas. First, LLM observability allows companies building AI agents or solutions to understand their AI system performance, including token consumption, input/output analysis, and the impact of system prompt changes or model upgrades on results and ROI.
The second area proved more challenging: building an AI SRE agent. The fundamental problem is that LLMs have limited context windows, but logs can span hundreds of gigabytes or terabytes. Sending even one gigabyte to an LLM is impossible, as the practical limit is a couple hundred kilobytes at most.
OpenObserve solved this by building log pattern recognition that reduces millions of log lines to perhaps 100 representative lines, combined with a manual rule-based correlation engine. An LLM layer on top of this infrastructure provides the final analysis. This AI SRE agent will initially identify problems for human review, with plans to enable automated remediation in future versions.
These enterprise features are included free for customers under the 200GB daily ingestion threshold, making advanced AI-powered observability accessible to startups and smaller organizations.
The vision for UI-less observability
During our conversation, Sharma imagines a surprising future development that would have sweeping consequences. Looking ahead, he sees the industry moving toward UI-less observability where dashboards become obsolete. He says that even he finds himself expecting AI chat interfaces in every software product, and realized OpenObserve needed this capability as a basic feature.
The future vision thus eliminates user interfaces entirely. Pre-built alerts based on rules and anomaly detection will generate notifications. The AI SRE agent will analyze problems and deliver findings directly to collaboration platforms like Slack or Teams where users already work, rather than requiring them to log into portals and analyze dashboards.
Sharma acknowledges this represents a significant shift for people accustomed to examining logs, traces, and dashboards manually. However, he believes the efficiency gains from automated detection and remediation will drive adoption as the technology matures and users become comfortable trusting AI systems to manage their infrastructure reliability.
Open source business model with enterprise features
OpenObserve operates on an open core model where approximately 99 percent of the source code is open source, with about 1 percent comprising enterprise features sold through licensing. The company also generates revenue through a managed cloud service. This approach makes the platform accessible to the open source community while providing a sustainable business model. The generous free tier for enterprise features (the aforementioned 200GB daily ingestion) ensures that companies can use advanced capabilities during their growth phase, only paying when they reach scale where observability costs become a significant line item.
By combining modern technology, open source accessibility, and AI-powered automation, OpenObserve is positioning itself as the next generation of observability platforms designed for cloud-native, data-intensive environments where legacy solutions struggle with cost and performance.
Also read: Groundcover CEO exposes the hard truths in observability