Mezmo Launches Fast & Precise AI SRE for Kubernetes Ahead of KubeCon

The observability market stands at an inflection point as AI-powered site reliability engineering moves from theoretical promise to practical reality. Yet separating genuine capability from vendor hype remains challenging, particularly as organizations grapple with spiraling telemetry costs and question whether today's AI models can truly deliver on their transformative potential. Tucker Callaway, CEO of Mezmo, believes the answer lies not in waiting for better models, but in fundamentally rethinking how telemetry data is processed and delivered to AI agents.

In this wide-ranging VMblog Q&A, Callaway makes the case that agentic root cause analysis is achievable today—provided organizations shift their approach from traditional prompt engineering to what Mezmo calls "context engineering." The company's recently rebranded Active Telemetry platform claims to deliver root cause analysis outcomes a standard deviation faster than industry benchmarks, reducing typical troubleshooting time from 50 minutes to just 5 minutes. Callaway also shares his provocative vision for observability's future, where dashboards transition from operational necessities to mere trust verification tools, and where the economics of the entire category face disruption as AI agents replace human analysis.

++

VMblog:  There's a lot of energy around AI SRE right now. How can folks separate what's reality from hype?

Tucker Callaway:  I think "energy" is the right word - there is excitement for the potential, skepticism for the reality, concerns about cost, debates on approaches and expectations to get deterministic outcomes from probabilistic systems. Some feel the models aren't enough today and need to be trained and others (myself included) feel that we have proven that they are more than sufficient to deliver outcomes today that are both performant and accurate.

In terms of breaking down the hype and realities, it's further complicated by the diversity of tasks and the expectations of the role of the SRE before we even get into what components of that role can be delivered through AI.

So I will answer that question through the specific lens of Root Cause Analysis, which is effectively why Observability exists and say that we have proven that this can be affordably and repeatedly be delivered today.

The reason we focus on RCA is because it is the critical gate to a fully agentic future - which I believe is closer than many think - if you can not confidently identify and diagnose anomalous behavior then all of the downstream potential can never be realized.

So in my view - Agentic RCA is a reality today but when we conflate the diversity of the tasks human SREs perform and we don't breakdown the critical workflows and tasks - we quickly leave reality and drift into hype.  

VMblog:  Mezmo has recently re-branded with a heavy emphasis on Active Telemetry. Can you explain what Active Telemetry is and what value it delivers?

Callaway:  At the risk of over generalizing, Observability today is largely driven by a single purpose supported by a single approach.

The purpose is to make complex data consumable by humans with the intent of identifying and diagnosing issues and the approach is to store all of the data and ask questions of that data later.

We recognized 3 years ago, even before AI amplified both the problem and the opportunity, that the physics behind the growth of data (and the corresponding cost) and the efficiencies behind the value derived from that data were fundamentally broken. Our hypothesis was and still is that the processing of data has to shift left, closer to the point of conception and we needed to create a platform that could handle the dynamic analysis of that data in motion. The linearity of collection, ingestion, storage and analysis has brought us to a point where the cost of Observability is rapidly approaching the cost to deliver what we are observing in the first place.

Active Telemetry is our answer to that problem. We have incorporated the processing, retention and analysis of telemetry data into a platform that provides developers with instant access to that data they need, Agents with curated context to deliver performant accuracy and empower platform teams with the governance, control and data orchestration they require.

VMblog:  In a recent press release, Mezmo states that the new AI SRE is a standard deviation faster than the industry standard when it comes to resolving issues in Kubernetes. How is that possible and what data do you have to back that up?

Callaway:  The short answer is Active Telemetry and Yes. :) That obviously requires more explanation so I will first cover the approach and then we can discuss the results.

As I alluded to earlier, a benefit of Active Telemetry is the ability to direct realtime curated context to an agent. This is another way of saying it enables Context Engineering of Telemetry data. Context engineering is the future of agentic outcomes and performance - Anthropic recently published a great post on the theory behind the concept that allows us to deliver these outcomes called "Effective context engineering for AI agents" - it's definitely worth a read for everyone thing about driving more cost effective and deterministic outcomes with AI.

The general premise is that the models today are sufficient to deliver cost effect RCA 90% faster than a human based approach. The reason this is not pervasive today lies in the inefficiencies of prompt based approaches and their inability to provide the context needed to the models.  

Clickhouse recently published a benchmark leveraging a prompt engineering approach identifying root causes on the OTEL demo application - we repeated the exercise with a context engineering approach powered by Active Telemetry. The difference in the results were striking - we saw 90% few tokens consumed and positive identification on the first try - no prompts needed - just the right context.

Beyond the benchmarks, our customers are experiencing the same outcomes. Long running issues that have been undiagnosed for years are resolved in minutes. Typical troubleshooting time to identification is reduced from 50 to 5 mins and the biggest surprise is always that the less we prompt, the better the outcome.  

It's incredibly exciting and we are just getting started - we have some really exciting enhances coming before the end of the year.

VMblog:  How do you think the rise of agents will impact observability? Should we expect humans to still be looking at dashboards in 2026?

Callaway:  Going back to my previous statement that the foundation of Observability today is to make complex data consumable by humans - the impact and opportunity can't be overstated. Driven by proper context, agentic RCA is possible today. This combined with the ability to better manage and orchestrate the retention of data behind the scenes will turn RCA and Observability into an AI driven outcome and SREs can go back to focusing on what they really love which is designing and architecting the systems.

When the analysis is performed by agents, there is no need for charts and graphs, the analysis is commoditized by the models and the curation and management of data is the driver of success. I don't believe the players in the space today have the ability to respond and shift at the speed this shift will happen. So yeah - I think there is going to be an impact - a huge benefit for the consumers and a massive shift in the providers.

Now ... you put a timeframe on the end of dashboards. I wouldn't take 2026 in the dashboard death pool. It will start to happen but as always the typical enterprise will need time. I do think in 2026 there will be a shift from a dashboard as an operational tool to a source of trust and confidence. There is also an element of risk management, compliance and audit that underlies a lot of my hypothesis. Trust and auditability will become a more embedded capability as we remove the humans from the loop - but that's a conversation for another time.

Next news
You're viewing our latest news item.
Previous news
You're viewing our oldest news item.
Why Synthetic Tracing Delivers Better Data, Not Just More Data
Why Agentic SREs Require Active Telemetry in Kubernetes
5 Startups Defining AI SRE
Mezmo Launches AI SRE Agent for Root Cause Analysis
AI-Driven Observability with Tucker Callaway | The Software With Podcast
Mezmo CEO Tucker Callaway on Active Telemetry, Context Engineering, and the Fastest AI SRE for Kubernetes | 10KMedia Podcast
Mezmo Launches Fast & Precise AI SRE for Kubernetes Ahead of KubeCon
Mezmo Wins 2025 Digital Innovator Award from Intellyx
Mezmo Announces Cost Optimization Workflow to Reduce Observability Spend for Datadog Users
Mezmo Disrupts Market by Reducing Observability Cost Structure by 90%
Building trust in telemetry data [Q&A]
2025 Observability Predictions - Part 1
Mezmo Simplifies Management of Telemetry Data to Reduce Observability Costs
At KubeCon/CloudNativeCon 2024, AI hype gives way to real application concerns
Mezmo Unveils Mezmo Flow for Guided Data Onboarding and One-Click Log Volume Optimization
Mezmo Flow Released
What’s new from KubeCon + Cloud Native Con North America 2024
Mezmo Unveils Mezmo Flow for Guided Data Onboarding and One-Click Log Volume Optimization - Yahoo Finance
Real-time Analytics News for the Week Ending November 16
Analytics and Data Science News for the Week of November 15; Updates from Alteryx, DataRobot, ThoughtSpot & More
Modern Observability Through Application Development
Mezmo Unveils Mezmo Flow for Guided Data Onboarding and One-Click Log Volume Optimization
Mezmo CEO Tucker Callaway Shares Observability Insights and KubeCon + CloudNativeCon 2024 Plans
Telemetry Data: The Puzzle Pieces of Observability
Q&A with Tucker Callaway, CEO of Mezmo
Mezmo Makes Inc. 5000’s List of Fastest Growing Companies in the Nation for Third Consecutive Year
7 Ways Telemetry Pipelines Unlock Data Confidence
The 2024 SD Times 100: 'Best in Show' in Software Development
Mezmo Hires Former StackHawk, New Relic Leader as Vice President of Product
Inside the VP of Sales' Journey: Financial Software to AI Startups - Craig McAndrews Spills it all!
Mezmo: Adding In-Stream Alert Capabilities to Telemetry Pipeline Platform
An IT Manager's (Re)View of the RSA Conference
Real-time Analytics News for the Week Ending May 11
Mezmo Adds Industry-First Stateful Processing in Telemetry Pipelines
SalesTechStar Interview with Craig McAndrews, Vice President of Sales at Mezmo
Mezmo Ranks No. 82 on Inc. Magazine’s List of the Pacific Region’s Fastest-Growing Private Companies
How To Break Down Silos To Get More Benefit From Your Data
Mezmo Bolsters Sales Leadership With New Hires From Chef and Apptio
How Metric Normalization Enhances Data Observability
KubeCon 2023: Telemetry and Data Management
Telemetry Data’s Role in Cybersecurity – Tucker Callaway – Enterprise Security Weekly
Breaking data silos between observability and security empowers organizations
2024 Application Performance Management Predictions - Part 3: Observability
Data Management News for the Week of November 10; Updates from AWS, Monte Carlo, Satori & More
Real-time Analytics News for the Week Ending November 11
At KubeCon NA 2023, finding cloud independence on the edges of Kubernetes
Mezmo Introduces Data Profiling and Responsive Telemetry Pipelines for Kubernetes
Data Profiling & Responsive Telemetry Pipelines For Kubernetes | Mezmo
KubeCon: GKE Enterprise gets release date, Mezmo adds data profiling feature, and more
Data Profiling & Responsive Telemetry Pipelines For Kubernetes | Mezmo
Data Profiling & Responsive Telemetry Pipelines For Kubernetes | Mezmo
Optimize Your Observability Spending in 5 Steps
Take Control of Your Kubernetes Telemetry Data
The Role of Observability Engineers in Managing Complex IT Systems
Mezmo Launches Welcome Pipeline to Unlock Kubernetes Insights Faster
Mezmo Ranks #1,386 on Inc. 5000’s List of Fastest Growing Companies in the Nation
Mezmo Simplifies Management of DevOps Telemetry Data
Mezmo Empowers Enterprises to Extract Business Insights from Telemetry Data
How DevOps Teams Can Manage Telemetry Data Complexity
Mezmo Wins the 2023 Digital Innovator Award from Intellyx
Tucker Callaway, Mezmo | RSA Conference 2023
Mezmo: Cloud Native Telemetry Pipeline
Mezmo Adds Free Community Plan for Managing Observability Data
Mezmo Announces Free Access to Telemetry Pipeline
Tame Telemetry Data With Mezmo Observability Pipeline
Mezmo Named 2023 Log Analytics Solution of the Year In Data Breakthrough Awards
Down the Observability Pipeline with Mezmo
How Developers, SRE Teams, and Security Engineers Use Telemetry Data
Data Pipeline Feeds IT's Observability Beast
How to Maximize Telemetry Data Value With Observability Pipelines
Mezmo Ranks #53 on Inc. Magazine’s List of Fastest-Growing Companies in the Pacific Region
Mezmo 2023 Predictions: More Organizations Adopt OpenTelemetry
Understanding Observability Data's Impact Across an Organization
Solutions Review Names 6 Data Observability Vendors to Watch, 2023
DevSecOps Accelerates Incident Detection, Response Efforts
2023 Application Performance Management Predictions - Part 3
Mezmo-Harris Poll Report Explores the Impact of Observability Data
Mezmo Wins Intellyx 2022 Digital Innovator Award
Mezmo Ranked No. 164 on Deloitte Technology Fast 500
Mezmo Wins 2022 Reworked IMPACT Award
Mezmo Unveils Observability Pipeline to Enhance the Value of Data
Launching a podcast? Try these 14 tips for greater exposure
DevSecOps Expedites Incident Detection and Response Time
Mezmo Named A Fastest Growing Company On Inc. 5000
DevSecOps Adoption Lags Despite Incident Detection Impact
Implementing DevSecOps Means Fewer Incidents
DevSecOps Reduces Security Incidents Research Finds
What is challenging successful DevSecOps adoption?
Fewer than one-quarter of organizations have a DevSecOps strategy
DevSecOps delivers significant results but take up remains low
DevSecOps adoption is low but packing a punch in user organizations
DevSecOps Drives Results, ESG Research Finds
101 Most Innovative Information Systems Startups
Protocol Enterprise Newsletter: Enterprise Moves
Headcount: Firings, Hirings, and Retirings — July 2022
“Above the Trend Line” – Your Industry Rumor Central for 8/8/2022
Strategies for successful rebranding
Key Areas In The IT Performance Vendor Landscape
Mezmo Appoints New CPO and CMO
Cybersecurity Leaders Launch NextGen Cyber Talent