Modern observability has delivered real value to teams in recent years. Metrics, logs and traces have helped them survive the shift to cloud-native architectures and always-on digital systems. But not all is well. Shahar Azulay, co-founder and CEO of Kubernetes-focused cloud-native observability platform company groundcover thinks that something has “quietly broken” in the fabric of this part of the technology universe.
Across industries, engineering teams are investing more and more in observability tooling and the number of vendors in this space appears to be growing. Usual suspects in this arena include Datadog, Dynatrace, New Relic, Splunk (Cisco), Grafana Labs, Elastic, Chronosphere (Palo Alto Networks), Amazon CloudWatch and indeed groundcover.
But what’s happening in the market is a move where the economics of legacy observability no longer scale with the systems they are meant to observe. That tension already existed before AI, but now it’s impossible to ignore.
A structural reality of distributed systems
“A central issue here is the fact that, as systems scale, telemetry scales even faster,” explained Azulay. “Every service creates metrics. Every request generates traces…. and logs multiply as the velocity of deployment increases. This is the structural reality of distributed systems.”
He points to research from Omdia that suggests organisations consistently “under-instrument” their environments, not because they lack the tools to do so, but because they can’t afford to fully use them.
Among the top challenges cited:
- 32% report cost inefficiencies tied to volume-based pricing.
- 29% cite high data storage costs for logs, metrics and traces.
- 28% struggle with observability costs rising faster than infrastructure itself.
Sampling, simples?
“In practice, this means teams buy sophisticated platforms, then deliberately starve them of the data they need to deliver value. Thus, sampling, filtering and selective logging become financial controls rather than engineering-first choices,” asserted Azulay. “Data sampling in observability is often considered ‘good enough’ here, i.e. it sounds efficient on paper: keep 1% of your traces, drop low-priority logs and reduce cardinality.”
But observability isn’t about averages. It’s about the actual outliers.
The groundcover team have shared their experiences in this space and they say that the most expensive outages or most damaging security incidents rarely look representative. They live in the anomalies. It might be the one trace the tech team didn’t keep, the rare request pattern that was dropped, or the edge case that sampling erased.
In real environments, sampling leads to:
- Critical transactions are failing without a recorded trace.
- Logs, metrics and traces are stored across separate tools with inconsistent collection rules.
- Engineers are debugging symptoms instead of root causes.
- Security signals are disappearing into discarded telemetry.
Fragmentation is a symptom of economics
“It could be argued that sampling does more damage than just reducing visibility. It changes the nature of what teams are able to know,” said Azulay. “One of the clearest signals that observability economics are broken is tool sprawl. According to the Omdia research, 69% of organisations now use six or more observability tools. Teams obviously don’t enjoy added complexity. But they’re finding that no single platform can be used comprehensively without blowing through budget constraints.”
The result is fragmented visibility i.e. application traces in one system, infrastructure metrics in another… and model or data-pipeline telemetry somewhere else.
Azulay reminds us that modern systems rarely fail inside a single layer; which in turn means that an infrastructure bottleneck can surface as application latency. A data freshness issue can appear as degraded model quality. A subtle LLM hallucination might originate in retrieval logic rather than the model itself. When observability is fragmented, these cross-layer failures become opaque. Teams see symptoms, not causes.
“When something goes wrong, context switching replaces insight. The mean time to resolution grows and developers become disengaged. That’s what happens when observability becomes something engineers need to justify financially rather than something they can rely on operationally,” said Azulay.
Into the realm of AI observability
If we agree with Azulay’s assertions, we can see that traditional applications have already strained observability pricing, but now AI workloads are shattering it.
A single AI service may need to track prompt and response pairs, token counts and cost per inference, latency distribution across models, GPU utilisation, retrieval quality in RAG pipelines, hallucinations, and response quality. Each of these requests could generate dozens (or even hundreds) of high-cardinality telemetry events.
“At scale, that becomes billions of events per month…. and with ingestion-based pricing, fully instrumenting this data could cost more than the infrastructure running the workload itself,” said Azulay. “But the challenge is not only volume. It is interconnectedness. Model behaviour depends on infrastructure capacity, data freshness, retrieval performance, orchestration logic and downstream services. Breakage does not respect architectural boundaries. An LLM latency spike might be GPU contention. A hallucination might stem from stale embeddings. An agent failure might be triggered by an upstream API timeout.”
The groundcover team have been there and got the t-shirt on this one. They tell us that AI observability simply can’t rely on heavy sampling i.e. it won’t detect hallucinations, bias drift or silent quality degradation if we’re only observing 1% of responses. In many cases, every response matters, especially in highly regulated industries such as healthcare or customer-facing environments like retail.
So then, we’re moving to a point where organisations look at deploying dedicated observability for AI and ML systems, but that’s complicated.
Traditional sampling breaks for LLM and agentic AI
Azulay explains that with classic application monitoring, teams use head-based or tail-based sampling to control trace volume. Head-based sampling randomly selects a percentage of traces up front, while tail-based sampling waits for full traces to complete and keeps those with errors or notable events. He says that this doesn’t work for LLM and agentic AI workflows. These systems generate enormous, high-cardinality traces with many spans and complex structures, well beyond the scale of traditional application traces.
“Since LLMs and agentic systems are non-deterministic, even small input changes can lead to entirely different execution paths. It’s easy to see why traditional sampling could easily miss rare but critical behaviours, subtle failures or edge cases that directly affect trust and output quality,” clarified Azulay. “Long-running agentic workflows that can last minutes or even hours and generate thousands of spans also make tail-based sampling impractical. Keeping everything in memory long enough to make sampling decisions ends up being expensive and unreliable.”
Perhaps most importantly, we can now see that it’s increasingly difficult to automatically determine which AI traces are “good” or “bad” and AI failures are typically nuanced: lower response quality, subtle hallucinations or drifting behaviour might not trigger error signals. That makes rules-based sampling insufficient for AI-driven systems.
LLM observability, fixed
Since AI-powered systems are fundamentally different from traditional software, observability must go beyond surface-level health checks. Observability for AI must expose reasoning paths, intermediate decisions and sequence dependencies so engineers can understand both what happened and why a specific output was produced.
“This requires us to understand that reasoning does not occur in isolation. It executes within infrastructure, data systems and application logic. Without correlated infrastructure signals, service traces and data-layer visibility, teams cannot determine whether a failure was logical, operational or environmental,” said Azulay.
He insists that we realise that context across layers is what transforms telemetry into understanding. It seems that effective LLM observability will require new techniques, including AI-assisted trace evaluation, intelligent retention of high-value telemetry, and deep correlation across agents, models and data sources. More importantly, it requires a system-wide perspective.
Why? Because reliable AI cannot be built on partial visibility.
Decoupling observability from data volume
Azulay suggests that what’s changing now is not the need for observability, but where it runs. To get away from ingestion-based constraints, organisations are adopting bring-your-own-cloud (BYOC) architectures. In this model:
- The observability platform runs inside the customer’s cloud.
- Telemetry data stays within existing infrastructure.
- Costs align with standard cloud storage and compute pricing.
- Vendors charge for software, not for every byte ingested.
This flips “can we afford to observe this?” to “what do we need to observe?”… which ought to sound like a more wholesome approach to anyone, right? High-cardinality data stops being a financial liability and becomes what it should have been all along: the raw material for reliability, safety, and performance.
“Let’s be clear – BYOC doesn’t eliminate intelligent data management, it restores its purpose,” surmised Azulay. “Sampling still exists, but it’s an engineering decision rather than a billing workaround. Teams filter data because it’s irrelevant, not because it’s expensive. They instrument new systems freely, knowing that an unexpected spike won’t trigger financial backlash.”
A view over infrastructure, applications, data systems & AI components
With economics no longer dictating visibility, the suggestion here is that organisations can maintain consistent observability across infrastructure, applications, data systems and AI components. Logs, metrics and traces become correlated. Root-cause analysis accelerates. Governance and compliance improve. Perhaps most importantly, observability is again able to support how systems are built rather than constraining how they’re understood.
“The uncomfortable truth is that AI can’t compensate for data that’s never collected. Reliable systems require understanding every point where they can break,” concluded Azulay. “Infrastructure, application logic, data movement and AI behaviour are not separate concerns. They are parts of a single system. Failures cross boundaries. Symptoms rarely appear where the root cause lives.”
As AI observability becomes a requirement rather than a differentiator, the economics of observability will matter as much as its features. The future will be about enabling teams to observe deeply, instrument freely and scale without fear of running up bills. At the end of the day, a new hard truth might be surfacing here… guessing can no longer be an option.

Image credits (above and main): groundcover