The road to production AI starts wherever you are.

Whether you're deploying your first AI agent or migrating off legacy observability, there's a path to production-grade agentic operations. Pick your starting point.

Where are you today?

Pick your starting point

Select the scenario that fits. We'll highlight where you join — and what comes next.
1
Build your first agent

No AI agents yet. Want to evaluate the approach.

2
Deploy a use case

Need production guardrails and real SRE use cases.

3
Engineer your context

Hallucinations, noisy data. Need better context.

4
Control your data

Costs unsustainable. Need to modernize and go proactive.

One system, two products, your context.

Mezmo platform
Data intelligence layer

Curate → Compress → Deliver

AURA (open source)
Agent control plane

Plan → Execute → Evaluate

Your Stack
Securely connect to your tools

Any LLM, any MCP tools, any observability

The journey

Four stages to production AI

1. Build your first agent

Clone → configure → run. AURA handles MCP wiring, streaming, provider abstraction, and RAG. You define what the agent does in TOML.

  • AURA OpenAI-compatible with streaming SSE: Point LibreChat, OpenWebUI, or any existing frontend at it — zero adapter code.
  • AURA 5 LLM providers: OpenAI, Anthropic, Bedrock, Gemini, Ollama. One-line swap. Automatic tool schema sanitization per provider.
  • AURA MCP tool discovery at runtime: Datadog, PagerDuty, Slack, internal APIs — dynamic discovery, no code changes.
Minimal agent config

[llm]
provider = "anthropic"
model    = "claude-sonnet-4-20250514"

[agent]
name          = "Ops Assistant"
system_prompt = "You're an SRE assistant"
turn_depth    = 3
Connect an MCP tool server

[mcp.servers.observability]
transport = "http_streamable"
url       = "http://mcp-server:8081/mcp"
headers   = {
  "Auth" = "Bearer {{ env.MCP_TOKEN }}"
}

# Or connect to Mezmo's MCP server

[mcp.servers.mezmo]
transport = "http_streamable"
url       = "http://mezmo-mcp:8081/mcp"
Quick start (3 commands)

# Clone and configure
git clone https://github.com/mezmo/aura
cp examples/reference.toml config.toml
export ANTHROPIC_API_KEY="sk-..."

# Run
cargo run --bin aura-web-server

# Or use Docker
docker compose up --build
Swap to Ollama (local/air-gapped)

[llm]
provider = "ollama"
model    = "llama3.1:latest"
url      = "http://localhost:11434"

# AURA parses tool calls even from
# models that emit them as plain text

2. Deploy a use case

Pick one workflow — incident triage, runbook-grounded RCA, post-mortem generation. Harden it with guardrails and observability before real traffic.

  • AURA Runbook-grounded RAG: Load docs with context_prefix so every recommendation cites its source — not the model's training data.
  • Mezmo Pre-built agentic SRE workflows for triage, RCA, remediation: Reactive mode: alert fires → agent responds → sub-5 min MTTR.
  • AURA Automated post-mortem generation: 4+ hour process → structured output with timeline, root cause, and action items.
  • AURA Safety controls: turn_depth, streaming timeouts, graceful shutdown, backpressure. Human-in-the-loop approval gates for sensitive ops.
  • AURA OpenTelemetry + OpenInference tracing: Every plan, prompt, tool call → Arize Phoenix, Jaeger, Tempo, Datadog, Mezmo. Full audit trail to disk.
  • AURA Per-tenant isolation via headers_from_request: Auth tokens forwarded per-request to MCP servers — no session affinity needed.
RAG with provenance

[[vector_stores]]
name           = "runbooks"
source         = "./runbooks/"
context_prefix = "Internal runbook: "

[[vector_stores]]
name           = "architecture_docs"
source         = "./docs/architecture/"
context_prefix = "Architecture ref: "
Production guardrails

[agent]
name          = "Incident Responder"
turn_depth    = 5
system_prompt = """
Ground recommendations in runbooks.
Cite sources. If no guidance found,
say so explicitly — don't guess.
"""

[server]
first_chunk_timeout_secs = 30
streaming_timeout_secs  = 120
Multi-tenant isolation

[mcp.servers.customer_tools]
transport        = "http_streamable"
url              = "http://tools:8081/mcp"
headers_from_req = [
  "Authorization",
  "X-Tenant-ID"
]
# Auth flows through to every tool call
# No sticky routing, stateless scaling
Enable tracing (env vars)
# OpenTelemetry — just set the endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=\
  "http://jaeger:4317"

# Or send to Arize Phoenix
export OTEL_EXPORTER_OTLP_ENDPOINT=\
  "http://phoenix:6006/v1/traces"

# AURA emits OpenInference spans:
# llm.*, tool.*, input.*, output.*

3. Engineer your context

The bottleneck isn't model intelligence — it's data quality. Curate, compress, and deliver just-in-time context. Cut hallucinations and token costs.

  • Mezmo Active Telemetry Pipeline: Deduplicate, cluster, enrich before agents see data. Up to 99.98% compression — every removed token saves inference cost.
  • Mezmo Agent-optimized MCP server: Returns curated, task-scoped data — not raw firehose. Other vendors' MCP tools are blunt instruments. Mezmo's are surgical.
  • Both Just-in-time context delivery: Each workflow step gets precisely scoped data. Dynamic assembly as investigations unfold — not a dump of everything.
  • AURA Plan → Execute → Synthesize → Evaluate loop: Built-in self-checks catch mismatches mid-investigation and trigger replanning.
AURA + Mezmo MCP (curated context)

[mcp.servers.mezmo]
transport = "http_streamable"
url       = "http://mezmo-mcp:8081/mcp"
headers   = {
  "Auth" = "Bearer {{ env.MEZMO_TOKEN }}"
}

# Mezmo MCP returns curated signals,
# not raw API firehose.
# ~$1/investigation vs $25-36 elsewhere
With vs without Mezmo

# WITHOUT Mezmo (raw vendor MCP)
# → 2.4M tokens per investigation
# → 88% noise in context window
# → $30-36 per investigation
# → 14+ min MTTR

# WITH Mezmo pipeline + MCP
# → <1K curated signals
# → noise removed before agent sees it
# → <$1 per investigation
# → <5 min MTTR

4. Control your data

Consolidate flows, adopt OTel, reduce cost, migrate off legacy systems. Shift from reactive firefighting to agents that prevent incidents.

  • Mezmo Native OTel ingestion and routing: Migrate incrementally — dual-write to old and new backends during transition. No rip-and-replace.
  • Mezmo Vendor-agnostic telemetry routing to Mezmo, Datadog, Grafana, Elastic, S3: AURA agents work against any of them via MCP.
  • Mezmo Cost profiling: Identify high-volume, low-value streams. Cut observability spend up to 70%.
  • Both Proactive anomaly detection: Continuous monitoring for degraded signals and drift. Surface issues before they become incidents.
  • Both Multi-agent orchestration: Specialized workers coordinated by an orchestrator for complex investigations.
  • Mezmo Compounding system memory: Incident patterns stored and reused. The 100th occurrence costs a fraction of the first.
Dual-write during migration

# Mezmo pipeline routes telemetry
# to multiple backends simultaneously

# Old backend (keep during transition)
# Datadog → existing dashboards still work

# New backend (AURA agents read from here)
# Mezmo → curated, agent-optimized data

# When ready: cut over, decommission old.
# No big-bang migration required.
LLM failover (one-line swap)
# Primary provider
[llm]
provider = "anthropic"
model    = "claude-sonnet-4-20250514"

# Provider down? Swap in 10 seconds:
# provider = "openai"
# model    = "gpt-4o"

# Or go local:
# provider = "ollama"
# model    = "llama3.1:latest"

# Tool integrations don't break.
# AURA sanitizes schemas per provider.

Sample configurations

DevOps assistant
GitHub

Reviews PRs, explores repos, and manages code workflows.

Incident response agent
PagerDuty + Datadog

Triages alerts, pulls metrics, and correlates monitoring data.

Kubernetes SRE agent
K8s cluster operations + monitoring

Inspects workloads, queries metrics, and assists with cluster troubleshooting.

Get to production fast. Start wherever you are.

AURA is free and open-source. Clone, configure, run.
Add Mezmo when you're ready for curated context.