**The road to production AI starts wherever you are.**

Whether you're deploying your first AI agent or migrating off legacy observability, there's a path to production-grade agentic operations. Pick your starting point.

View on GitHub Discuss with an Expert

Where are you today?

Pick your starting point

Select the scenario that fits. We'll highlight where you join — and what comes next.

Build your first agent

No AI agents yet. Want to evaluate the approach.

Deploy a use case

Need production guardrails and real SRE use cases.

Engineer your context

Hallucinations, noisy data. Need better context.

Control your data

Costs unsustainable. Need to modernize and go proactive.

One system, two products, your context.

Mezmo platform

Data intelligence layer

Curate → Compress → Deliver

AURA (open source)

Agent control plane

Plan → Execute → Evaluate

Your Stack

Securely connect to your tools

Any LLM, any MCP tools, any observability

The journey

Four stages to production AI

1. Build your first agent

Clone → configure → run. AURA handles MCP wiring, streaming, provider abstraction, and RAG. You define what the agent does in TOML.

^AURA OpenAI-compatible with streaming SSE: Point LibreChat, OpenWebUI, or any existing frontend at it — zero adapter code.
^AURA 5 LLM providers: OpenAI, Anthropic, Bedrock, Gemini, Ollama. One-line swap. Automatic tool schema sanitization per provider.
^AURA MCP tool discovery at runtime: Datadog, PagerDuty, Slack, internal APIs — dynamic discovery, no code changes.

Minimal agent config

[llm]
provider = "anthropic"
model    = "claude-sonnet-4-20250514"

[agent]
name          = "Ops Assistant"
system_prompt = "You're an SRE assistant"
turn_depth    = 3

Connect an MCP tool server

[mcp.servers.observability]
transport = "http_streamable"
url       = "http://mcp-server:8081/mcp"
headers   = {
  "Auth" = "Bearer {{ env.MCP_TOKEN }}"
}

# Or connect to Mezmo's MCP server

[mcp.servers.mezmo]
transport = "http_streamable"
url       = "http://mezmo-mcp:8081/mcp"

Quick start (3 commands)

# Clone and configure
git clone https://github.com/mezmo/aura
cp examples/reference.toml config.toml
export ANTHROPIC_API_KEY="sk-..."

# Run
cargo run --bin aura-web-server

# Or use Docker
docker compose up --build

Swap to Ollama (local/air-gapped)

[llm]
provider = "ollama"
model    = "llama3.1:latest"
url      = "http://localhost:11434"

# AURA parses tool calls even from
# models that emit them as plain text

Clone AURA

Live use cases

For developers

2. Deploy a use case

Pick one workflow — incident triage, runbook-grounded RCA, post-mortem generation. Harden it with guardrails and observability before real traffic.

^AURA Runbook-grounded RAG: Load docs with context_prefix so every recommendation cites its source — not the model's training data.
_Mezmo Pre-built agentic SRE workflows for triage, RCA, remediation: Reactive mode: alert fires → agent responds → sub-5 min MTTR.
^AURA Automated post-mortem generation: 4+ hour process → structured output with timeline, root cause, and action items.
^AURA Safety controls: turn_depth, streaming timeouts, graceful shutdown, backpressure. Human-in-the-loop approval gates for sensitive ops.
^AURA OpenTelemetry + OpenInference tracing: Every plan, prompt, tool call → Arize Phoenix, Jaeger, Tempo, Datadog, Mezmo. Full audit trail to disk.
^AURA Per-tenant isolation via headers_from_request: Auth tokens forwarded per-request to MCP servers — no session affinity needed.

RAG with provenance

[[vector_stores]]
name           = "runbooks"
source         = "./runbooks/"
context_prefix = "Internal runbook: "

[[vector_stores]]
name           = "architecture_docs"
source         = "./docs/architecture/"
context_prefix = "Architecture ref: "

Production guardrails

[agent]
name          = "Incident Responder"
turn_depth    = 5
system_prompt = """
Ground recommendations in runbooks.
Cite sources. If no guidance found,
say so explicitly — don't guess.
"""

[server]
first_chunk_timeout_secs = 30
streaming_timeout_secs  = 120

Multi-tenant isolation

[mcp.servers.customer_tools]
transport        = "http_streamable"
url              = "http://tools:8081/mcp"
headers_from_req = [
  "Authorization",
  "X-Tenant-ID"
]
# Auth flows through to every tool call
# No sticky routing, stateless scaling

Enable tracing (env vars)
# OpenTelemetry — just set the endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=\
  "http://jaeger:4317"

# Or send to Arize Phoenix
export OTEL_EXPORTER_OTLP_ENDPOINT=\
  "http://phoenix:6006/v1/traces"

# AURA emits OpenInference spans:
# llm.*, tool.*, input.*, output.*

SRE solutions

Platform engineers

For developers

3. Engineer your context

The bottleneck isn't model intelligence — it's data quality. Curate, compress, and deliver just-in-time context. Cut hallucinations and token costs.

_Mezmo Active Telemetry Pipeline: Deduplicate, cluster, enrich before agents see data. Up to 99.98% compression — every removed token saves inference cost.
_Mezmo Agent-optimized MCP server: Returns curated, task-scoped data — not raw firehose. Other vendors' MCP tools are blunt instruments. Mezmo's are surgical.
Both Just-in-time context delivery: Each workflow step gets precisely scoped data. Dynamic assembly as investigations unfold — not a dump of everything.
^AURA Plan → Execute → Synthesize → Evaluate loop: Built-in self-checks catch mismatches mid-investigation and trigger replanning.

AURA + Mezmo MCP (curated context)

[mcp.servers.mezmo]
transport = "http_streamable"
url       = "http://mezmo-mcp:8081/mcp"
headers   = {
  "Auth" = "Bearer {{ env.MEZMO_TOKEN }}"
}

# Mezmo MCP returns curated signals,
# not raw API firehose.
# ~$1/investigation vs $25-36 elsewhere

With vs without Mezmo

# WITHOUT Mezmo (raw vendor MCP)
# → 2.4M tokens per investigation
# → 88% noise in context window
# → $30-36 per investigation
# → 14+ min MTTR

# WITH Mezmo pipeline + MCP
# → <1K curated signals
# → noise removed before agent sees it
# → <$1 per investigation
# → <5 min MTTR

Context engineering

AI SRE for RCA

Get a demo

4. Control your data

Consolidate flows, adopt OTel, reduce cost, migrate off legacy systems. Shift from reactive firefighting to agents that prevent incidents.

_Mezmo Native OTel ingestion and routing: Migrate incrementally — dual-write to old and new backends during transition. No rip-and-replace.
_Mezmo Vendor-agnostic telemetry routing to Mezmo, Datadog, Grafana, Elastic, S3: AURA agents work against any of them via MCP.
_Mezmo Cost profiling: Identify high-volume, low-value streams. Cut observability spend up to 70%.
Both Proactive anomaly detection: Continuous monitoring for degraded signals and drift. Surface issues before they become incidents.
Both Multi-agent orchestration: Specialized workers coordinated by an orchestrator for complex investigations.
_Mezmo Compounding system memory: Incident patterns stored and reused. The 100th occurrence costs a fraction of the first.

Dual-write during migration

# Mezmo pipeline routes telemetry
# to multiple backends simultaneously

# Old backend (keep during transition)
# Datadog → existing dashboards still work

# New backend (AURA agents read from here)
# Mezmo → curated, agent-optimized data

# When ready: cut over, decommission old.
# No big-bang migration required.

LLM failover (one-line swap)
# Primary provider
[llm]
provider = "anthropic"
model    = "claude-sonnet-4-20250514"

# Provider down? Swap in 10 seconds:
# provider = "openai"
# model    = "gpt-4o"

# Or go local:
# provider = "ollama"
# model    = "llama3.1:latest"

# Tool integrations don't break.
# AURA sanitizes schemas per provider.