The Observability stack is collapsing: Why context-first data is the only path to AI-powered root cause analysis

Ask about this page

The core promise of modern observability is simple: cut Mean Time To Resolution (MTTR). Yet, despite a boom in tooling and investment over the last four years, the data tells a sobering story: our industry is actually getting worse at finding and resolving issues.

Dashboards, once our trusted guide, have become the starting point for a chaotic "dashboard hunt" that rarely leads to the definitive root cause. The problem isn't a lack of data; it's a fundamental architectural flaw in how we handle it.

The old paradigm—dumping all telemetry data into a massive repository because we're afraid we'll miss something—is finally collapsing under its own weight.

1. The Trap of Passive Telemetry

The current observability approach is built on fear and volume:

The Stack Grows Taller: Complex applications generate logs, metrics, and traces at an incredible rate.
The Data Pile Grows Higher: Engineers feed all of this data into a huge, single repository.
The Dashboards Multiply: We build reports and analytics on top of the pile, constantly updating or creating new dashboards to chase new problems.

This storage-first approach forces engineers to perform manual correlation across multiple teams and tools, wading through a sea of log files to piece together a sequence of events—a process that is rarely neat or orderly.

The data collected in this manner is inherently INACTIVE. It lacks identity, intent, and crucial context. Its value is entirely temporal, existing only in case something happens to require it. Most of the time, this "high-entropy telemetry" is simply a pile of fragments waiting for a human or a tool to reconstruct meaning after storage.

2. Why AI Fails with the Old Paradigm

As AI adoption accelerates, we risk forcing this same broken paradigm onto our models. Many organizations are attempting to achieve Root Cause Analysis (RCA) by throwing tons of this raw, inactive data into specialized models.

But AI agents aren't human; they operate from fact, not fear. When fed this unstructured application exhaust, AI struggles.

The result isn't insight—it’s statistical storytelling. Inactive telemetry becomes "beautifully formatted ignorance; structured noise with no connective tissue". AI cannot reason about relationships that were never captured, leading to confident but incorrect predictions that do nothing to improve MTTR.

The failure isn't the model's intelligence—it’s the data architecture.

3. The Architectural Correction: Context-First Telemetry

The solution is not about building more specialized models; in fact, publicly available models can be highly effective. The correction is architectural: trading Inactive Telemetry for Active Telemetry.

Instead of building context after storage, context must be built into the telemetry itself while it's in motion. This process involves:

Understanding the value of the data.
Normalizing and enriching the signals.
Deduplicating, semantically grouping, and simplifying the data before it ever reaches the model.

By focusing on what’s important, we drastically reduce the possibilities the model has to process, allowing it to focus on the core problem. This Context Engineering approach leads to more accurate results, requires fewer tokens, and drastically cuts cost.

4. Context Engineering: A Data-Driven Advantage

We know this works because we put it to the test against a common industry benchmark. Last year, a ClickHouse blog post asked: “Can LLMs replace on-call SREs today?” Their conclusion was that LLMs were not ready, citing a bottleneck of "missing context".

Factor	Traditional Prompt Engineering (Raw Data)	Context Engineering (Mezmo)
RCA Accuracy	Inconsistent, even with top models	✅ Accurate RCA on first try
Guidance Required	Yes, often multiple prompts	❌ None needed
Token Efficiency	45K – 1.2M tokens	✅ ~27K total
Cost	$1–$6 typical per incident	✅ $0.06 per incident
Tool / API Calls	12–27 per incident	✅ 1 call
Context Quality	Raw logs, traces, metrics	✅ Curated, scoped context

By implementing a context-first approach using a telemetry pipeline, we ran the same benchmark errors and achieved accurate RCA on the first pass every time. The results were definitive: LLMs are ready to be a significant help for Root Cause Analysis—but only if we arm them with the right data architecture.

This approach is already being validated by early customer feedback, where it has led to a significant reduction in time to identify and resolve issues, and even pinpointed the source of nagging problems that humans had tried to find for months.

Want to dive deeper? Schedule some time with the Mezmo team.

‍

Root Cause Analysis

Thought Leadership

Context Engineering

Telemetry for AI Agents

Table of contents

The Observability stack is collapsing: Why context-first data is the only path to AI-powered root cause analysis

1. The Trap of Passive Telemetry

2. Why AI Fails with the Old Paradigm

3. The Architectural Correction: Context-First Telemetry

4. Context Engineering: A Data-Driven Advantage

More blog posts