The 2026 AI SRE Market Map: Agents, Harnesses, and the Data Layer
TL;DR
- More than half of SRE teams plan to deploy agentic AI in production within 12 months, more than double last year's confidence, per the SRE Report 2026.
- The agentic AI market is projected to reach $93 billion by 2030 at a 65.5% CAGR, and every major observability incumbent has now shipped an AI SRE agent.
- Two structural choices shape every buying decision: which agent harness runs your reasoning, and which data layer feeds it.
- The market has split into four categories: closed agents, incumbents, telemetry substrates, and open-source harnesses.
- Among the vendors covered, one cell stays empty in the comparison matrix: no closed agent or incumbent combines an open harness with an optimized data layer.
Why 2026 is the inflection year for AI SRE
The agentic AI market is projected to reach $93 billion by 2030 at a 65.5% compound annual growth rate, which analysts at Metavert call the fastest-growing enterprise software segment ever recorded. That number alone reads like a forecast you can safely ignore until the next budget cycle. The adoption data says otherwise.
More than half of SRE professionals plan to deploy agentic AI systems in production within the next 12 months, according to the SRE Report 2026 from LogicMonitor, which surveyed 418 practitioners during July and August 2025. The report describes that figure as more than double the confidence reported a year earlier. What shifted is intent to deploy in production, not curiosity or pilot interest.
The economics behind that shift are concrete. AI inference costs fell 92% in three years, from roughly $30 per million tokens in early 2023 to between $0.10 and $2.50 by February 2026, per the same Metavert compilation. Running an agent through thousands of investigation steps stopped being a research-budget line item and became something a platform team can justify against on-call hours saved. Median toil still consumes 34% of engineers' time per the SRE Report, and 49% of respondents say AI has already cut into it.
The established vendors have already shipped, which is the clearest signal that this is no longer a futures story. All three observability incumbents put AI SRE agents into the market in the last six months. Datadog moved Bits AI to general availability, Dynatrace released its determinism-first Intelligence layer, and New Relic announced its SRE Agent. Both major hyperscalers followed, with AWS taking its DevOps Agent to general availability in April 2026 and Azure shipping its own.
When five of the largest infrastructure and monitoring companies all commit to the same capability within two quarters, the open question stops being whether agentic SRE works in production. The question becomes which architecture you build on, and that decision is the subject of the rest of this map.
How the market has split: a four-category taxonomy
The AI SRE market has been divided into four architectural categories, and each one answers a different version of the same question. Where does your data live, and who controls the reasoning that runs on top of it? Sorting any vendor into one of these buckets tells you more about the tradeoff you are accepting than any benchmark a vendor publishes.
Closed AI SRE agents
The first category covers purpose-built agents that own the investigation workflow end to end. Resolve AI, Traversal, Cleric, and Neubird belong here. These platforms deliver the strongest investigation quality because the vendor controls the model, the orchestration, and the feedback loop. The cost is that your telemetry flows into the vendor's cloud, and the agent's logic stays opaque. You buy capability and accept lock-in.
Incumbents shipping AI on existing data planes
The second category covers observability and cloud platforms that bolt an agent onto data they already collect. Datadog Bits AI, Dynatrace Intelligence, New Relic's SRE Agent, and the AWS DevOps Agent all follow this pattern. For teams already standardized on one of these platforms, the agent is the lowest-friction option because it reasons over data that is already there. The constraint is the data wall. Datadog's cross-platform reasoning over Splunk and Grafana sits in preview, not GA, and Dynatrace's deterministic grounding architecture gets its 12x accuracy edge precisely because it stays inside the Dynatrace platform. Depth inside one data plane comes at the price of reach across others.
Telemetry and data substrate platforms
The third category sits below the agents. Cribl, Apica, and Mezmo route, reduce, and shape telemetry before any reasoning happens. These platforms make no claim to investigate incidents themselves. They control the data layer that every agent depends on, which means they can feed multiple agents without forcing you to commit your telemetry to one vendor's cloud. The tradeoff runs the other direction from the closed agents. You keep control of your data and stay vendor-neutral, but you supply the reasoning layer separately.
OSS and DIY harnesses
The fourth category covers open-source agent harnesses such as HolmesGPT, K8sGPT, and OpenSRE. You assemble the agent yourself, pick your own model, and wire it to your own telemetry. Nothing leaves your control, and no vendor owns your roadmap. The price is ownership of everything, including integration, evaluation, and ongoing maintenance. Platform teams with strong engineering capacity and a mandate to avoid proprietary tooling start here.
Two axes run underneath these four categories. The harness axis asks who owns the agent logic, ranging from a closed vendor to your own engineers. The data axis asks who controls the telemetry, ranging from the vendor's cloud to your own infrastructure. Every vendor in this market makes one bet on each axis. The evaluation framework later in this article gives you the questions to make those bets deliberately rather than by default.
Category 1: Closed AI SRE agents
Closed AI SRE agents run as managed cloud platforms. You connect your monitoring tools, code repositories, and chat systems to a vendor's hosted reasoning engine, and the agent investigates incidents inside that vendor's environment. The category bets that investigation quality improves when one company controls the entire stack, from data ingestion to model fine-tuning to the remediation workflow. You get the strongest out-of-the-box accuracy in exchange for routing your operational data through someone else's cloud and tying your incident response to their roadmap.
Resolve AI
Resolve AI is the best-funded entrant in the closed-agent category and the one with the most enterprise validation. The company raised a $125M Series A at a $1 billion valuation led by Lightspeed Venture Partners, then followed with a $40M extension at a $1.5B valuation. Co-founder Spiros Xanthos previously ran observability at Splunk and co-created OpenTelemetry, which explains why the integration coverage is broad and the data plumbing is credible.
The published benchmarks come from named customers, which is rarer in this market than it should be. DoorDash reported up to an 87% reduction in time to root cause, and Coinbase reported 73% faster time to root cause. Other named accounts include MSCI, Salesforce, Zscaler, MongoDB, Toast, and Pinecone. The May 2026 platform update added always-on background agents that audit alert hygiene, flag configuration drift, and triage on-call alerts within five minutes before an engineer engages.
The architecture leans into multi-agent investigation. A coordinated team of specialized agents pursues multiple hypotheses in parallel, which Resolve says delivers a more than 2x improvement in root cause accuracy on internal evaluation sets versus earlier versions. The platform now exposes a REST API and an MCP server, so you can wire Resolve into broader agentic workflows rather than treating it as a closed endpoint.
The constraint is the deployment model. Resolve runs an on-premise satellite agent as a secure gateway, but the core platform and the per-customer fine-tuned models live in Resolve's cloud. The company holds SOC 2 Type II certification and states it does not persist raw data or train across customers, which addresses the most common objection. Even so, top-tier investigation quality here requires sending operational telemetry to a third party, and that creates data residency questions for regulated teams and lock-in for everyone. Pricing is custom enterprise only, with no public page, free tier, or self-service signup. If your data governance allows the cloud-hosted model, Resolve sets the accuracy bar. If it does not, the rest of this map matters more.
Traversal
Traversal stakes its differentiation on architecture rather than integrations, building two proprietary components that map and traverse a production system in real time. The Production World Model keeps a continuously updated picture of how services, infrastructure, and networking connect. The Causal Search Engine walks that map across more than 10 hops to find a root cause, which Traversal claims cuts investigation from hours to minutes (LinkedIn). Founded by AI researchers from MIT, Columbia, Berkeley, and Cornell, and backed by Sequoia Capital and Kleiner Perkins, the company positions itself as the AI SRE for the enterprise.
The enterprise framing holds up in the customer list. Traversal names American Express, Capital One, Kraken, Pepsi, and DigitalOcean as deployments on its own profile (LinkedIn). Those are large, regulated, high-traffic environments, and landing them with a Seed-stage product signals real technical credibility. Scott Gorman, who joined go-to-market from Cribl and Splunk, brings a track record of scaling telemetry companies from early revenue to billions, which explains some of the early enterprise traction.
Here, the honesty caveat matters. The figures circulating in the market for the American Express deployment, 82% RCA accuracy, 32% MTTR reduction, and 250 billion logs processed daily, do not appear in any source available for this map. Traversal's public profile confirms American Express as a customer but publishes no performance benchmarks, log volumes, or accuracy percentages (LinkedIn). A TechCrunch URL that supposedly carried these numbers returned a 404 with no usable content. Treat the 82% and 32% claims as vendor-attributed and unverified until independent measurement or a primary source confirms them.
Funding tells the same incomplete story. Public records show only a Seed round closed in July 2025, with no disclosed dollar amount, so reports of a larger raise lack a traceable source (LinkedIn).
Traversal fits the same closed-agent constraint as the rest of Category 1. The causal architecture works because Traversal ingests and models your telemetry inside its own platform, which means routing production data through a vendor cloud and accepting its system understanding as the source of truth. For teams that want a proprietary causal engine and have the data-residency latitude to feed it, Traversal is a serious contender. For teams that need to verify claims before they commit, the missing benchmarks are a reason to run your own evaluation.
Cleric
Cleric earned a Gartner Cool Vendor 2025 designation in AI for SRE and Observability, and that recognition tracks with how the company positions itself in the mid-market. It publishes three headline numbers on its homepage. Five minutes to root cause, 92% actionable findings, and more than 200,000 production-grade investigations completed. Those figures are vendor-stated rather than independently audited, so read them as the company's own benchmark rather than a third-party result.
The real differentiator is operational memory, not raw speed. Most closed agents investigate each incident from a cold start, treating every alert as if the system had never seen it before. Cleric instead builds what it calls institutional knowledge. Every resolution it produces becomes context that the whole team can reuse on the next incident. Three subsystems carry this. One automatically maps services, dependencies, and ownership. A second verifies proposed fixes against the live environment before recommending them. A third accumulates context over time, so patterns from past investigations sharpen future ones through a positive and negative feedback loop.
That self-learning model shows up in how customers describe daily use. Maxime Fouilleul, Head of Infrastructure and Operations at BlaBlaCar, framed it as an SRE companion for software engineers, noting that when one team solves an alert, the knowledge transfers to others. Affaf Ahtisham, a Head of Engineering, quoted on the same page, put it more bluntly, saying nobody opens the Cloudflare or Datadog dashboards anymore because the team's first stop is Cleric. The widely circulated figure of 20 to 30 percent engineering capacity freed at BlaBlaCar does not appear in Cleric's published source material, so treat it as unverified.
On safety, Cleric runs read-only by default, with write access available only when a customer explicitly opts in. The platform is SOC 2 Type II compliant, encrypts data in transit and at rest, and states that customer data never trains its models. The read-only posture matters for mid-market teams that want investigation automation without handing an agent permission to modify production on its own.
Cleric raised $9.8M in seed funding to date,, a smaller war chest than Resolve AI's Series A. Its compare pages target AWS DevOps Agent, Azure SRE Agent, Bits AI, Grafana Assistant, and PagerDuty's SRE Agent, which signals where it expects to win deals. The closed-platform tradeoff still applies. Investigation quality depends on routing your telemetry through Cleric, and the operational memory it builds lives inside Cleric rather than in infrastructure you control.
Neubird
Neubird sits squarely in the closed-agent category alongside Resolve AI, Traversal, and Cleric. Its product, Hawkeye, markets itself as an autonomous SRE agent that investigates incidents and proposes resolutions without human prompting at each step.
Public benchmark data on Neubird is thin. Independent verification of its accuracy or MTTR figures does not appear in the sources reviewed for this map, so place it by architecture rather than by measured outcome. The company describes a system that ingests telemetry, builds an understanding of the environment, and runs investigation workflows inside its own platform.
You route data and reasoning through a single vendor's stack, which buys fast time-to-value at the cost of portability and model choice. Teams evaluating Neubird should treat its claims as vendor-stated until they run their own incident replays against it, and should weigh whether routing telemetry through a proprietary cloud fits their data-residency and lock-in tolerance. Against better-funded and more publicly benchmarked rivals like Resolve AI, Neubird competes on category fit, not on demonstrated metrics.
Category 2: Incumbents shipping AI agents
The observability vendors already holding your telemetry have an obvious advantage in the AI SRE race. Their agents reason over data they already collect, so deployment means flipping a switch rather than wiring a new platform into your stack. Datadog, Dynatrace, and New Relic each shipped an agent in the last two quarters, and both major hyperscalers followed. Across all five, the agent works well inside the vendor's own data plane and reaches less reliably outside it.
Datadog Bits AI
Bits AI SRE is the lowest-friction option for any team already running Datadog, because the agent investigates against the telemetry Datadog already holds. It reached general availability on December 2, 2025, Datadog's first generally available AI agent, after testing across more than 2,000 customer environments and tens of thousands of investigations. For existing customers, turning it on requires no new data pipeline.
A March 2026 update made Bits roughly twice as fast, completing investigations in about three to four minutes depending on complexity. The same release widened the data the agent reasons over to include source code, profiling, Database Monitoring, Real User Monitoring, and Network Path. It also added an Agent Trace view that exposes tool calls and intermediate reasoning steps, so you can audit how Bits reached a conclusion.
You can encode team-specific knowledge through a bits.md file, which lets Bits fold runbook references and organizational context into its investigation process. Once Bits finishes, it can act. It posts to Slack or Teams, creates Datadog incidents, pages engineers, opens cases, and files Jira tickets pre-filled with investigation context. Three named Action Catalog actions, Trigger Investigation, Get Investigation, and List Investigation, let you chain a Bits investigation into automated remediation workflows you build.
The data wall is the real constraint. At GA, Bits reasons over Datadog-collected data only. Cross-platform reasoning over GitHub, ServiceNow, Grafana, Splunk, Dynatrace, and Sentry remained in preview as of June 2026, not generally available. If your telemetry lives partly outside Datadog, the agent investigates with an incomplete picture until those integrations ship.
Pricing stacks on top of existing Datadog licensing, at $25 per investigation on an annual plan and $36 on demand, with only conclusive investigations billed. Bits suits teams already standardized on Datadog who want fast wins inside that data plane. Teams with telemetry spread across several tools should treat the preview status of cross-platform reach as the deciding factor.
Dynatrace Intelligence
Dynatrace builds the most defensible technical argument of the three observability incumbents, and it starts with a math problem that most agentic vendors avoid. At Dynatrace Perform 2026, CTO Bernd Greifeneder walked through the accuracy-compounding trap. An LLM that is 95% accurate on a single call drops to roughly 60% end-to-end success across 10 sequential agentic calls, because errors accumulate at each step. "This is not acceptable," he said. His second point matters just as much. You cannot load a petabyte of logs into an agent without bursting the context window, and more data raises the hallucination risk rather than lowering it.
Dynatrace answers both problems by running deterministic AI before generative AI touches the workflow. A root cause agent walks millions of causal dependencies through the Smartscape topology graph, an analytics engine transforms exabyte-scale data in the Grail lakehouse into AI-ready context, and a forecasting agent scales prediction across millions of metrics. Generative reasoning only enters once those agents have established factual grounding. Greifeneder put numbers behind the design. Dynatrace Intelligence reports 12x higher success rates, 3x faster resolution, and half the token cost versus LLM-only approaches, and he noted the gap widens as environments grow more complex.
United Airlines supplies the enterprise-scale proof. Ramiro Zavala, Head of IT Operations, Observability and Quality, described an environment where a single boarding pass triggers up to 500 unique services across more than 2,000 application services running on mainframes, on-premises systems, and the cloud. Diagnosing a major incident once pulled in upwards of 250 people. United consolidated roughly 800 applications in nine months and tied the work to two of its best operating years on record, including the number one ranking for on-time departures and a 2.6-point gain in customer satisfaction. The next phase pairs agentic automation with ServiceNow workflows.
Deterministic grounding only works because the root cause agent, the analytics engine, and the forecasting agent all read from Smartscape and Grail, which means the accuracy advantage is inseparable from running your telemetry inside Dynatrace. The platform does expose agent framework paths through Model Context Protocol, CrewAI, and Bedrock Agents in its instrumentation examples, so the agent layer is more open than the data layer. Of the incumbents, Dynatrace offers the strongest reasoning quality and the deepest lock-in, because the two are the same engineering decision.
New Relic SRE Agent
New Relic completes the observability incumbent pattern with its SRE Agent, announced at Advance 2026 in March as part of the Intelligent Observability Platform. The product follows the same playbook as Datadog Bits AI and Dynatrace Intelligence. New Relic ships agentic capability on top of an existing data plane, and the agent reasons over telemetry already collected inside the platform. CPO Brian Emerson framed the move around scale, arguing that AI-driven development creates a surge of system changes and telemetry volume that engineering teams can no longer handle on their own.
The agent's differentiation lives in workflow rather than raw investigation. New Relic positions the SRE Agent around a digital war room model, deploying always-on teammates that diagnose incidents and recommend next steps before an engineer acknowledges a page. The Slack and Zoom integrations carry the user experience. Responders query New Relic directly from triage rooms, and the agent captures human context to drive fact-finding, impact assessment, and automated post-incident reports. For teams whose incident response already runs through chat and video, that surface reduces context-switching during an active page.
Intelligent root cause analysis, which New Relic calls iRCA, is the technical anchor. It searches the affected entity's topology graph, scores that graph using probabilistic causal models, and applies a path-based ranking algorithm to narrow the problem space in seconds rather than hours. The approach resembles Dynatrace's determinism-first argument, pairing generative model flexibility with deterministic causal structure to separate signal from noise. New Relic chose to build on fast-improving general models rather than train niche ones, with Emerson reasoning that specialization stays useful for a short time given the pace of model improvement.
Most of the announced capabilities sat in preview as of the March 2026 reveal, not general availability. Workflow Automation reached GA and supports multi-step processes with conditional logic, human approvals, and bidirectional agent invocation. The Performance Risks Inbox and Smart Alerts remained in preview. No independent benchmarks, pricing, or quantified customer outcomes were published alongside the launch.
The SRE Agent reasons over data inside New Relic, so its reach depends on what you already send there. Teams with a large New Relic install base get the lowest-friction path. Teams running telemetry across multiple platforms hit the same data wall.
AWS DevOps Agent
AWS DevOps Agent reached general availability in April 2026, after debuting in preview at re:Invent 2025, and it brings the strongest public benchmark evidence of any entrant in this category. AWS built it on Amazon Bedrock AgentCore. Preview-period figures cited by AWS point to up to 75% lower MTTR, 94% root cause accuracy, and 80% faster investigations. Two named customers back the claims. Western Governors University cut resolution time from two hours to 28 minutes, and Zenchef resolved an API integration issue in 20 to 30 minutes instead of one to two hours.
The agent organizes its behavior into a three-tier skills hierarchy that explains how it improves over time. AWS-provided skills cover common investigation patterns out of the box. Custom skills let you extend the agent with your own runbooks and procedures. The third tier, Learned Skills, runs a background sub-agent that builds reusable patterns from prior investigations and team behavior, which AWS describes as turning tribal knowledge into operational memory. That self-learning layer is the closest any incumbent comes to Cleric's operational-memory approach.
Governance sits inside Agent Spaces, isolated containers that handle cross-account telemetry correlation, granular IAM permissions, immutable audit journals, identity integration with Okta or Entra ID, and customer-managed keys. For regulated teams, the immutable audit trail and per-space permission boundaries answer the question of what the agent did and who authorized it. Pricing follows AWS conventions, billed per second of active agent work with no per-seat licensing.
MCP-based extensibility is AWS's partial answer to the lock-in concern that shadows every closed agent. Through Model Context Protocol servers, the agent can reach internal observability platforms, CMDB datasets, runbook libraries, and private operational APIs. Native integrations already cover CloudWatch, Datadog, Dynatrace, New Relic, Splunk, Grafana, PagerDuty, ServiceNow, GitHub, GitLab, and Slack. The GA release added investigation of Azure and on-prem environments, which widens the agent beyond a single cloud.
AWS DevOps Agent delivers the most value when your workloads, identity, and telemetry already live in AWS. MCP and the multi-cloud additions soften the boundary, but the deepest integration, the cleanest governance story, and the per-second economics all assume AWS as the center of gravity. If you run a hybrid estate with significant Azure or on-prem footprint, you inherit the agent's bias toward its home environment, and the extensibility layer becomes the workaround rather than the design center.
Category 3: Telemetry and data substrate platforms
The first three categories assume the data an agent reasons over is already clean, complete, and routed to the right place. In practice, it rarely is. The telemetry and data substrate platforms sit underneath the agents, controlling how logs, metrics, and traces get collected, reduced, and stored before any reasoning happens. Cribl, Apica, Metoro, and Mezmo compete here, and their architectural decisions shape what every agent above them can actually do.
An agent is only as good as the signal it receives. A closed agent with state-of-the-art reasoning still fails when half the relevant traces never reach it or when retention costs force teams to drop the data that would have explained the incident. The vendors below solve the supply problem rather than the investigation problem, and the strongest architectures treat the two as one continuous system.
Cribl
Cribl runs the data-in-motion layer that sits between your sources and your analysis tools, and it operates at Fortune 100 scale. Its core product routes, filters, and reshapes telemetry in flight, so teams can drop low-value logs, reduce volume before it hits expensive storage, and send the same stream to multiple destinations. That position matters for AI SRE because every agent downstream depends on what Cribl decides to forward. The pipeline runs before the reasoning, and what it discards an agent can never investigate.
Cribl markets an "AI Platform for Telemetry" repositioning that frames the routing layer as the foundation for agent-driven operations. Cribl's architectural role is clear. It controls the substrate, not the analysis, which makes it a strong fit for teams whose first problem is telemetry cost and routing across many tools. It does not perform investigation on its own, so a team buying Cribl still needs an agent or platform above it to turn the routed data into root cause.
Apica
Apica enters the data substrate category as a newer challenger to Cribl, built around a storage architecture it calls InstaStore. The product separates compute from storage so teams can retain telemetry in low-cost object storage and query it on demand, which matters most when log and metric volume outpaces what a traditional indexed store can hold affordably. Apica frames its platform as agentic-ready, meaning the data sits in a form that AI agents can query without first rehydrating it into an expensive hot tier.
The company markets a 40% reduction in total cost of ownership against conventional observability stacks. We could not verify that figure against an independent source or a published methodology, so treat it as a vendor claim rather than a benchmark. Any TCO number depends entirely on the baseline you compare against and the retention period you assume, and Apica has not published those inputs in material available here.
Where Apica fits is narrower than the agentic SRE pitch suggests. The platform solves a storage and cost problem, not an investigation problem. If your primary pain is that telemetry retention has become unaffordable and your agents stall when they need historical data, Apica's separation of storage from query gives you a way to keep more signal at lower cost. If your primary need is autonomous root-cause analysis, Apica supplies the substrate an agent reasons over, not the reasoning itself.
Apica belongs alongside Cribl rather than alongside Resolve AI or Cleric. Both companies operate at the data-in-motion and data-at-rest layers that precede agent reasoning, and neither ships the agent. For a team managing high telemetry volume under budget pressure, Apica is worth evaluating on its storage economics. For a team whose roadmap centers on closing incidents faster, it is one component of a larger architecture, not the whole answer.
Metoro
Metoro collects deep system telemetry with eBPF, the kernel-level instrumentation that watches pods and services without requiring engineers to manually add code. That distinction matters because most observability gaps come from missing instrumentation, not missing dashboards. The Grafana 2026 Observability Survey of more than 1,300 practitioners found that 47% of teams expanded OpenTelemetry usage last year, yet only 41% run it in production. Incident reviews still end with "we didn't have the right signals," and Metoro's eBPF approach attacks that problem directly.
Pricing is straightforward and Kubernetes-scoped. A free tier covers one cluster and two nodes, and the Scale plan runs $20 per node per month as verified in May 2026. For a Kubernetes-heavy team that keeps losing time to memory pressure or syscall failures it never captured, that price buys continuous deep-system data inside every pod.
Metoro fixes data quality, but it does not run the investigation. In a pod-crash scenario, Metoro hands the engineer rich telemetry about what was happening inside the container before it failed, and the engineer still interprets that data and decides what to do. The Sherlocks.ai analysis places Metoro at the analyze-to-enrich stage of its maturity model, describing the product as "a strong component in a broader stack rather than a standalone solution." Outside Kubernetes, its value falls off, since the eBPF collection targets containerized workloads.
Read Metoro as a signal-completeness layer, not an autonomous SRE agent. It belongs in the data substrate category alongside Cribl and Apica because it improves what feeds the investigation rather than automating the investigation itself. A team that pairs Metoro's eBPF feed with an open agent harness gets the deep signals plus the reasoning workflow Metoro deliberately leaves to the engineer.
Mezmo
Mezmo occupies the one position in this map that no other vendor holds. For a broader look at the tools in this category, see Mezmo's AI SRE tools overview. It is open at the agent layer and intelligent at the data layer, so a team can run the AI SRE workflow it wants on telemetry it controls. The four-category taxonomy splits the market because most vendors solve one axis and surrender the other. Mezmo addresses both with two distinct products that work together.
The first layer is AURA, an open-source agent harness. AURA gives you the orchestration scaffolding that closed agents like Resolve AI and Cleric keep proprietary, so you decide which models run, which tools they can call, and how investigation steps chain together. A closed agent forces you to accept one vendor's reasoning architecture and roadmap. AURA lets you swap models, run several agents against the same incident, and inspect every decision the system makes, because the harness itself is code you can read and modify. For teams that have watched a single AI vendor change pricing or deprecate a feature, that openness is the difference between owning a workflow and renting one.
The second layer is PRISM, an intelligent telemetry pipeline that sits in the path of your data before any agent reasons over it. PRISM solves the constraint that defines the incumbent category. Datadog Bits AI and Dynatrace Intelligence reason well, but only over data already inside their platforms, which is the data-wall problem every incumbent profile in this map runs into. PRISM routes, shapes, and reduces telemetry across sources, so an agent gets a complete and clean view of an incident regardless of which cloud or tool produced the underlying signal. With more than half of SRE teams planning agentic deployments within twelve months (apmdigest.com) and the average enterprise now running 144 non-human identities per human employee (meditations.metavert.io), the volume those agents will query makes a pipeline that controls cost and completeness a practical requirement, not a nice-to-have.
Together, AURA and PRISM answer the two-axis problem that splits the rest of the market. Closed agents give you capability and take your control. Incumbents give you depth inside their walls and stop at the edge. Data substrate platforms like Cribl and Apica handle telemetry in motion but leave the agent layer to someone else. Mezmo runs an open harness on top of a pipeline it manages, so neither axis is borrowed from a vendor whose interests may diverge from yours.
That combination fits a specific kind of team. If you run a single cloud and a single observability platform, an incumbent agent is the lower-friction choice, and this map says so plainly. Mezmo earns its place when your telemetry lives across AWS, Azure, and on-prem systems, when you want to run more than one agent and compare how they reason, and when you are unwilling to bet your incident response on one AI vendor's roadmap surviving the next eighteen months. OpenTelemetry is the cross-stack standard that makes portability real (ibm.com), and an open harness on a controlled pipeline is how you take advantage of it.
Category 4: OSS and DIY harnesses
The open-source AI SRE harnesses give you total control over the model, the prompts, and the integration surface, but they hand you the full burden of running them in production. HolmesGPT, K8sGPT, and OpenSRE each package an LLM-driven investigation loop around Kubernetes and observability data, and each leaves the harder work to you. You wire up the data sources, write the evaluation suites that catch hallucinated root causes, and own every upgrade when the underlying model or API changes.
K8sGPT focuses on scanning Kubernetes clusters and translating raw errors into plain-language diagnoses, which makes it a natural starting point for platform teams already living in kubectl. HolmesGPT extends that idea toward broader incident investigation, pulling in alerts, logs, and runbooks to draft a root-cause narrative. OpenSRE aims at the full agentic workflow, closer in ambition to the closed agents in Category 1 but without the vendor running it for you. Public adoption data and licensing details for all three are thinner than the funded vendors, so treat capability claims as project-dependent rather than benchmarked.
You avoid every form of vendor lock-in and keep your telemetry inside your own boundary, which matters when data residency or model choice is a hard requirement. In exchange, you absorb the integration work, the prompt tuning, the regression testing, and the on-call burden of an agent that can confidently produce a wrong answer. The SRE Report 2026 found that only 6% of teams have protected learning time and most spend three to four hours a month on upskilling, which is the capacity these projects quietly assume you have (apmdigest.com).
Pick this category if you run a strong platform engineering team with a mandate to avoid proprietary tooling and the headcount to maintain an agent the way you maintain any other internal service.
Vendor comparison: architecture, openness, and fit
Two columns carry most of the decision weight. Whether the data layer is included tells you if the vendor owns your telemetry or sits on top of what you already run. Whether the agent framework is flexible tells you if you can swap models and orchestration logic or inherit the vendor's roadmap. Read those two together, and every other column falls into place.
The closed-agent rows share one pattern. Each delivers strong investigation quality and asks you to route data through the vendor's cloud, and none lets you bring your own orchestration layer. The incumbent rows reverse the tradeoff. They include the data layer because they already own it, which is why their reach stops at their own data plane. Datadog Bits reasons over Datadog-collected data at GA, with cross-platform integrations still in preview, and Dynatrace grounds its agents in Grail before any model runs.
Only one row reads open across both axes. Mezmo pairs an open-source harness with an included pipeline, which means you control the data layer and the agent framework at the same time. Every other vendor forces a choice between owning your telemetry and owning your agents.
How to evaluate an AI SRE platform: three questions
Three questions sort vendors into categories faster than any feature comparison. Answer them honestly about your own environment, and the right architectural fit becomes obvious before you sit through a single demo.
Q1: Where does your data live, and who controls it?
Start by tracing where your telemetry already sits and what it would cost to move it. If your logs, metrics, and traces span multiple clouds, a mainframe, and on-premises systems, an agent that only reasons over one vendor's data plane will see a fraction of your stack. Closed agents like Resolve AI route data through their own cloud, which raises data residency questions for regulated teams. A data substrate layer keeps telemetry in motion under your control and feeds whichever agent you choose. The answer to this question tells you whether you need a vendor that owns your data or a layer that sits in front of many.
Q2: What can the agent actually touch and modify?
Read-only investigation and closed-loop remediation are different products, and the gap between them defines your risk tolerance. An agent that surfaces a root cause and stops there carries little blast radius. An agent that restarts services, reroutes traffic, or rolls back deployments needs scoped permissions, audit trails, and a human approval gate. AWS DevOps Agent isolates this with Agent Spaces, granular IAM, and immutable audit journals. Decide how much authority you will delegate before you evaluate, because that decision narrows the field more sharply than accuracy benchmarks do.
Q3: Who owns the model and the workflow long-term?
The model and orchestration logic you adopt now will be expensive to unwind in eighteen months, so weigh the durability of that choice carefully. Incumbents like Dynatrace build deep, deterministic architecture that delivers real accuracy gains and binds you tightly to one roadmap. Dynatrace's CTO Bernd Greifeneder makes a sound point that chaining ten LLM calls at 95% accuracy each yields a 60% end result, which is why grounding matters. The harder question is whether that grounding lives inside a vendor you cannot swap or a harness you operate. Open harnesses let you change models as inference costs fall and capabilities shift. Closed platforms make that switch a migration.
Teams that answer these three questions converge on a category on their own.
Conclusion
Two structural choices will define how your team operates for the next several years. The first is the harness layer, the framework that decides what an agent can reach, modify, and learn from. The second is the data layer, the substrate that feeds every agent the signal it reasons over. Both decisions are compound. A closed agent wired to a single vendor's data plane works well until you add a second cloud or a second tool, and unwinding that commitment after eighteen months of operational dependency costs far more than the original buy.
The timing is concrete. More than half of SRE teams plan to deploy agentic AI in production within twelve months, according to the SRE Report 2026, and all three observability incumbents plus both major hyperscalers shipped agents in the last six months. The teams choosing now are setting defaults that the rest of their stack will inherit.
Pick architecture deliberately. An open harness over an open data layer keeps both choices reversible, the only hedge that survives a market changing this fast. Decide where your data lives and what your agents can touch before you decide whose agent to run.
How we built this market map
We built this map from primary sources. Vendor announcements, press releases, analyst recognition, and published customer case studies form the backbone of every profile. The market context draws on the SRE Report 2026 from LogicMonitor and the agentic AI sizing figures compiled in Metavert's State of AI Agents.
Several benchmark figures circulating in the market did not survive verification, and we flagged them rather than repeat them. The Coinbase root cause number appears as 73% in available sources, not other figures sometimes cited. Traversal's widely quoted 82% RCA accuracy and 32% MTTR reduction at American Express do not appear in any source we could confirm, so we name the customer and omit the unverified metrics.
Inclusion reflects architectural relevance, not payment. No vendor paid for placement, and none reviewed their profile before publication. A tool earns a spot by representing a distinct approach to the harness or data layer.
Frequently asked questions
What is the difference between an AI SRE agent and an AI observability platform?
An AI SRE agent investigates incidents and proposes or executes fixes, while an AI observability platform collects, stores, and analyzes telemetry. Mezmo treats these as separate layers, pairing the AURA agent harness with the PRISM telemetry pipeline so the agent reasons over data it does not have to lock you into. You get investigation automation without surrendering control of where your data lives.
Are open-source AI SRE agents production-ready?
Open-source harnesses like HolmesGPT and K8sGPT run in production today, but they hand you full ownership of integration, evaluation, and maintenance. Mezmo's AURA gives you an open-source harness without forcing you to build the surrounding pipeline and connectors yourself. Teams with strong platform engineering capacity gain control and avoid betting on a single vendor's roadmap.
How should I assess a vendor's benchmark claims?
Check whether a figure is independently verified or vendor-reported, and ask which baseline it compares against. Resolve AI publishes DoorDash and Coinbase root-cause numbers tied to named customers, while AWS reports 94% root cause accuracy from preview-period results, so the source and scope matter as much as the number. A benchmark you can trace to a real customer and a defined test set beats a headline percentage with no context.
Where should I start if I have no existing agentic infrastructure?
Answer three questions first. Decide where your data lives and who controls it, what an agent can actually touch in your stack, and who owns the model and workflow over the long term. Mezmo's open, full-stack architecture lets you start with a telemetry pipeline and add the harness as your team builds confidence, rather than committing to one closed platform before you understand your own requirements.
Related Articles
Share Article
Ready to Transform Your Observability?
- ✔ Start free trial in minutes
- ✔ No credit card required
- ✔ Quick setup and integration
- ✔ Expert onboarding support
