The Answer to SRE Agent Failures: Context Engineering

4 MIN READ
MIN READ

Why Your SRE Agent Overpromises and Underproduces (Plus How to Fix That)

10X the Results with a Different Approach

AI agents for SREs were supposed to slash mean time to resolution and eliminate alert fatigue. Instead, most teams got expensive, unreliable tools that burn through tokens without delivering insights.

But what if the problem isn't the AI models themselves?

Recent benchmarking reveals the real bottleneck: context engineering. When we tested our context engineering approach against conventional methods, the results were dramatic:

  • 90%+ cost reduction: From $1-$6 per incident to $0.06
  • First-try accuracy: Root cause analysis with much less prompting
  • Token efficiency: 27K tokens instead of 500K+

Scroll down for our benchmark results to see the full comparison.

The difference comes down to one fundamental insight: SREs need help finding needles, not more haystacks getting in their way.

Why Current Approaches Fall Short

Recent LLM benchmarking exposed the limitations of the conventional approach to making SRE agents work well. Even top-tier models like Claude Sonnet 4, OpenAI GPT-4.1, o3, Gemini 2.5, and GPT-5 struggled with observability tasks absent properly managed context:

  • Multiple prompts required to guide the LLM
  • Models consumed hundreds of thousands of tokens
  • Incident costs ballooned to $1-$6 per root cause analysis
  • Accuracy remained inconsistent despite sophisticated models

The conclusion of testing done thus far has been clear -  "The bottleneck isn't model IQ — it's missing context."

More About the ‘Haystack’ Problem

Most teams approach AI-powered incident response with what we call the "haystack" mentality — they assume more context equals better results, so they firehose everything at their AI agent:

  • Raw logs from every service
  • Unfiltered metrics across all timeframes
  • Every alert and notification
  • Complete telemetry streams

But here's the counterintuitive reality: when you're looking for needles, including more hay makes the situation worse.

This firehose approach creates predictable failures:

  • Information Overload: AI agents get buried under irrelevant data. That database connection spike from three days ago has nothing to do with today's payment processing issue, but it's consuming tokens and confusing the analysis.
  • Signal Dilution: Critical error messages get lost in routine application logs and infrastructure metrics that have nothing to do with the current incident.
  • Analysis Paralysis: Instead of focusing on the failing subsystem, AI agents try to correlate anything to everything, leading to vague conclusions or incorrect guesses rather than decisive root cause identification.

Recent research released by OpenAI explains why models fall down and hallucinate. What this means for us is that we need to be better about managing context for our SRE agents, or our efforts are likely to go sideways as well.

What AI-Driven Observability Should Look Like

Instant Insight, Not Token Bloat

The ideal interaction looks like this: 

You ask: "Why is the payment service slow?"

Your AI agent responds: "Spike in database queries after the 2:15 PM deploy is driving elevated latency. The new feature query optimization isn't working as expected."

No multi-prompt conversations. No $6 token bill. No prompt engineering required. Just the answer, backed by clear reasoning.

AI That Acts Like a Skilled Intern

Your SRE agent should function like a brilliant intern working under expert supervision. As Drew Breunig notes in his research on AI use cases, the most effective AI applications today fall into the "intern" category — powerful tools used by experts, but never without oversight.

Your AI intern should:

  • Process incident data rapidly while you focus on high-level analysis and decision-making
  • Surface probable root causes from patterns across logs, metrics, and traces for your review
  • Draft remediation suggestions based on historical data that you can validate and execute
  • Explain its reasoning transparently so you can learn from its analysis and catch any errors

The key difference? Your AI agent amplifies SRE capabilities rather than replacing human expertise. It handles time-consuming data processing and initial analysis while expert SREs provide context, validate conclusions, and make final decisions.

The Context Engineering Breakthrough: Benchmark Results

We tested our context engineering approach using the same scenarios and models as recent industry benchmarks. The difference was striking:

Metric Conventional Approach Context Engineering
RCA Accuracy Inconsistent results First-try success
Token Usage ~500K+ per incident ~27K per incident
Cost per RCA $1–$6 $0.06
Tool Calls 12–27 per incident 1
Prompt Guidance Multiple prompts required None needed
Context Quality Raw telemetry firehose Curated, scoped context

Why Context Engineering Works

The performance difference comes down to three key innovations: 

  • Preprocessing over Parsing: Instead of making AI dig through raw logs during incidents, we structure and enrich data as it flows through our pipeline.
  • Enrichment over Guesswork: Our context engine adds semantic meaning, relationships, and operational knowledge that would otherwise require assumptions.
  • Intent-Based Routing: When you ask about payment service performance, you get payment-specific context — not a firehose of unrelated telemetry.

Mezmo's Context Engineering Platform

What We Built

Rather than throwing raw telemetry at an LLM and hoping for the best, we engineered a complete context delivery system for AI in observability:

  • Structured Payloads: Curated, scoped context instead of raw log dumps
  • Active Telemetry: Data processed and enriched at ingestion time, not hours later during incident response
  • Just-in-Time Context: Tailored information based on user intent and query scope
  • Complete AI Infrastructure: Including MCP server, context engine, chatbots, agents and native support for a variety of providers such as OpenAI, Bedrock, and LangChain

Delivering the Needle, Not More Haystack

Mezmo acts like an expert detective who knows exactly where to look and what evidence matters:

  • Curated Intelligence: When your payment service is slow, we don't send every log line from every service. We send specific database query patterns, deployment timing, and error correlations that actually relate to payment processing performance.
  • Focused Context: Your AI agent receives a targeted briefing about the specific system and timeframe that matters, not a documentary about your entire infrastructure.
  • Pattern Recognition: Instead of asking AI to find patterns across millions of events, we surface the patterns that matter and let AI focus on interpretation and recommendations.

The result? Your AI agent spends its intelligence solving problems, not searching through irrelevant data.

Transform Your SRE Operations

So if your AI agent is underperforming, the issue likely isn't your model or your agent — it's your context engine (or lack thereof).

The benchmark results we were able to achieve prove that with proper context engineering, even simple prompts can deliver accurate root cause analysis at 90% lower cost and 10X faster than conventional approaches.

Ready to Experience Context Engineering?

Whether you want to empower your existing agents with our context engineering platform or leverage our complete observability solution powered by the same technology and complete agents of our own, we can help you achieve breakthrough performance.

For Teams Building Their Own Agents:

  • Integrate our context engineering platform via MCP
  • Transform your agent's performance with curated, intent-based context
  • Reduce costs while improving accuracy and response times

For Teams Wanting a Complete Solution:

  • Access our observability agent powered by advanced context engineering
  • Get instant root cause analysis without building or maintaining AI infrastructure
  • Focus on resolving incidents, not managing AI systems

Book a demo and see how context engineering transforms AI-powered observability from an expensive experiment into a reliable operational advantage.

Mezmo's context engineering platform transforms raw telemetry into AI-ready insights, enabling intelligent agents that deliver accurate analysis at scale. Learn more at mezmo.com.

Table of Contents

    Share Article

    RSS Feed

    Next blog post
    You're viewing our latest blog post.
    Previous blog post
    You're viewing our oldest blog post.
    Mezmo + Catchpoint deliver observability SREs can rely on
    Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)
    What is Active Telemetry
    Launching an agentic SRE for root cause analysis
    Paving the way for a new era: Mezmo's Active Telemetry
    The Answer to SRE Agent Failures: Context Engineering
    Empowering an MCP server with a telemetry pipeline
    The Debugging Bottleneck: A Manual Log-Sifting Expedition
    The Smartest Member of Your Developer Ecosystem: Introducing the Mezmo MCP Server
    Your New AI Assistant for a Smarter Workflow
    The Observability Problem Isn't Data Volume Anymore—It's Context
    Beyond the Pipeline: Data Isn't Oil, It's Power.
    The Platform Engineer's Playbook: Mastering OpenTelemetry & Compliance with Mezmo and Dynatrace
    From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace
    Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility
    Architecting for Value: A Playbook for Sustainable Observability
    How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines
    Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo
    Introducing the New Mezmo Product Homepage
    The Inconvenient Truth About AI Ethics in Observability
    Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)
    Do you Grok It?
    Top Five Reasons Telemetry Pipelines Should Be on Every Engineer’s Radar
    Is It a Cup or a Pot? Helping You Pinpoint the Problem—and Sleep Through the Night
    Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos
    Why Datadog Falls Short for Log Management and What to Do Instead
    Telemetry for Modern Apps: Reducing MTTR with Smarter Signals
    Transforming Observability: Simpler, Smarter, and More Affordable Data Control
    Datadog: The Good, The Bad, The Costly
    Mezmo Recognized with 25 G2 Awards for Spring 2025
    Reducing Telemetry Toil with Rapid Pipelining
    Cut Costs, Not Insights:   A Practical Guide to Telemetry Data Optimization
    Webinar Recap: Telemetry Pipeline 101
    Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit
    2024 Recap - Highlights of Mezmo’s product enhancements
    My Favorite Observability and DevOps Articles of 2024
    AWS re:Invent ‘24: Generative AI Observability, Platform Engineering, and 99.9995% Availability
    From Gartner IOCS 2024 Conference: AI, Observability Data, and Telemetry Pipelines
    Our team’s learnings from Kubecon: Use Exemplars, Configuring OTel, and OTTL cookbook
    How Mezmo Uses a Telemetry Pipeline to Handle Metrics, Part II
    Webinar Recap: 2024 DORA Report: Accelerate State of DevOps
    Kubecon ‘24 recap: Patent Trolls, OTel Lessons at Scale, and Principle Platform Abstractions
    Announcing Mezmo Flow: Build a Telemetry Pipeline in 15 minutes
    Key Takeaways from the 2024 DORA Report
    Webinar Recap | Telemetry Data Management: Tales from the Trenches
    What are SLOs/SLIs/SLAs?
    Webinar Recap | Next Gen Log Management: Maximize Log Value with Telemetry Pipelines
    Creating In-Stream Alerts for Telemetry Data
    Creating Re-Usable Components for Telemetry Pipelines
    Optimizing Data for Service Management Objective Monitoring
    More Value From Your Logs: Next Generation Log Management from Mezmo
    A Day in the Life of a Mezmo SRE
    Webinar Recap: Applying a Data Engineering Approach to Telemetry Data
    Dogfooding at Mezmo: How we used telemetry pipeline to reduce data volume
    Unlocking Business Insights with Telemetry Pipelines
    Why Your Telemetry (Observability) Pipelines Need to be Responsive
    How Data Profiling Can Reduce Burnout
    Data Optimization Technique: Route Data to Specialized Processing Chains
    Data Privacy Takeaways from Gartner Security & Risk Summit
    Mastering Telemetry Pipelines: Driving Compliance and Data Optimization
    A Recap of Gartner Security and Risk Summit: GenAI, Augmented Cybersecurity, Burnout
    Why Telemetry Pipelines Should Be A Part Of Your Compliance Strategy
    Pipeline Module: Event to Metric
    Telemetry Data Compliance Module
    OpenTelemetry: The Key To Unified Telemetry Data
    Data optimization technique: convert events to metrics
    What’s New With Mezmo: In-stream Alerting
    How Mezmo Used Telemetry Pipeline to Handle Metrics
    Webinar Recap: Mastering Telemetry Pipelines - A DevOps Lifecycle Approach to Data Management
    Open-source Telemetry Pipelines: An Overview
    SRECon Recap: Product Reliability, Burn Out, and more
    Webinar Recap: How to Manage Telemetry Data with Confidence
    Webinar Recap: Myths and Realities in Telemetry Data Handling
    Using Vector to Build a Telemetry Pipeline Solution
    Managing Telemetry Data Overflow in Kubernetes with Resource Quotas and Limits
    How To Optimize Telemetry Pipelines For Better Observability and Security
    Gartner IOCS Conference Recap: Monitoring and Observing Environments with Telemetry Pipelines
    AWS re:Invent 2023 highlights: Observability at Stripe, Capital One, and McDonald’s
    Webinar Recap: Best Practices for Observability Pipelines
    Introducing Responsive Pipelines from Mezmo
    My First KubeCon - Tales of the K8’s community, DE&I, sustainability, and OTel
    Modernize Telemetry Pipeline Management with Mezmo Pipeline as Code
    How To Profile and Optimize Telemetry Data: A Deep Dive
    Kubernetes Telemetry Data Optimization in Five Steps with Mezmo
    Introducing Mezmo Edge: A Secure Approach To Telemetry Data
    Understand Kubernetes Telemetry Data Immediately With Mezmo’s Welcome Pipeline
    Unearthing Gold: Deriving Metrics from Logs with Mezmo Telemetry Pipeline
    Webinar Recap: The Single Pane of Glass Myth
    Empower Observability Engineers: Enhance Engineering With Mezmo
    Webinar Recap: How to Get More Out of Your Log Data
    Unraveling the Log Data Explosion: New Market Research Shows Trends and Challenges
    Webinar Recap: Unlocking the Full Value of Telemetry Data
    Data-Driven Decision Making: Leveraging Metrics and Logs-to-Metrics Processors
    How To Configure The Mezmo Telemetry Pipeline
    Supercharge Elasticsearch Observability With Telemetry Pipelines
    Enhancing Grafana Observability With Telemetry Pipelines
    Optimizing Your Splunk Experience with Telemetry Pipelines
    Webinar Recap: Unlocking Business Performance with Telemetry Data
    Enhancing Datadog Observability with Telemetry Pipelines
    Transforming Your Data With Telemetry Pipelines