The road to production AI starts wherever you are.
Whether you're deploying your first AI agent or migrating off legacy observability, there's a path to production-grade agentic operations. Pick your starting point.
Pick your starting point
One system, two products, your context.
Curate → Compress → Deliver
Plan → Execute → Evaluate
Any LLM, any MCP tools, any observability
Four stages to production AI
1. Build your first agent
Clone → configure → run. AURA handles MCP wiring, streaming, provider abstraction, and RAG. You define what the agent does in TOML.
- AURA OpenAI-compatible with streaming SSE: Point LibreChat, OpenWebUI, or any existing frontend at it — zero adapter code.
- AURA 5 LLM providers: OpenAI, Anthropic, Bedrock, Gemini, Ollama. One-line swap. Automatic tool schema sanitization per provider.
- AURA MCP tool discovery at runtime: Datadog, PagerDuty, Slack, internal APIs — dynamic discovery, no code changes.
Minimal agent config
[llm]
provider = "anthropic"
model = "claude-sonnet-4-20250514"
[agent]
name = "Ops Assistant"
system_prompt = "You're an SRE assistant"
turn_depth = 3Connect an MCP tool server
[mcp.servers.observability]
transport = "http_streamable"
url = "http://mcp-server:8081/mcp"
headers = {
"Auth" = "Bearer {{ env.MCP_TOKEN }}"
}
# Or connect to Mezmo's MCP server
[mcp.servers.mezmo]
transport = "http_streamable"
url = "http://mezmo-mcp:8081/mcp"Quick start (3 commands)
# Clone and configure
git clone https://github.com/mezmo/aura
cp examples/reference.toml config.toml
export ANTHROPIC_API_KEY="sk-..."
# Run
cargo run --bin aura-web-server
# Or use Docker
docker compose up --buildSwap to Ollama (local/air-gapped)
[llm]
provider = "ollama"
model = "llama3.1:latest"
url = "http://localhost:11434"
# AURA parses tool calls even from
# models that emit them as plain text2. Deploy a use case
Pick one workflow — incident triage, runbook-grounded RCA, post-mortem generation. Harden it with guardrails and observability before real traffic.
- AURA Runbook-grounded RAG: Load docs with context_prefix so every recommendation cites its source — not the model's training data.
- Mezmo Pre-built agentic SRE workflows for triage, RCA, remediation: Reactive mode: alert fires → agent responds → sub-5 min MTTR.
- AURA Automated post-mortem generation: 4+ hour process → structured output with timeline, root cause, and action items.
- AURA Safety controls: turn_depth, streaming timeouts, graceful shutdown, backpressure. Human-in-the-loop approval gates for sensitive ops.
- AURA OpenTelemetry + OpenInference tracing: Every plan, prompt, tool call → Arize Phoenix, Jaeger, Tempo, Datadog, Mezmo. Full audit trail to disk.
- AURA Per-tenant isolation via headers_from_request: Auth tokens forwarded per-request to MCP servers — no session affinity needed.
RAG with provenance
[[vector_stores]]
name = "runbooks"
source = "./runbooks/"
context_prefix = "Internal runbook: "
[[vector_stores]]
name = "architecture_docs"
source = "./docs/architecture/"
context_prefix = "Architecture ref: "Production guardrails
[agent]
name = "Incident Responder"
turn_depth = 5
system_prompt = """
Ground recommendations in runbooks.
Cite sources. If no guidance found,
say so explicitly — don't guess.
"""
[server]
first_chunk_timeout_secs = 30
streaming_timeout_secs = 120Multi-tenant isolation
[mcp.servers.customer_tools]
transport = "http_streamable"
url = "http://tools:8081/mcp"
headers_from_req = [
"Authorization",
"X-Tenant-ID"
]
# Auth flows through to every tool call
# No sticky routing, stateless scalingEnable tracing (env vars)
# OpenTelemetry — just set the endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=\
"http://jaeger:4317"
# Or send to Arize Phoenix
export OTEL_EXPORTER_OTLP_ENDPOINT=\
"http://phoenix:6006/v1/traces"
# AURA emits OpenInference spans:
# llm.*, tool.*, input.*, output.*3. Engineer your context
The bottleneck isn't model intelligence — it's data quality. Curate, compress, and deliver just-in-time context. Cut hallucinations and token costs.
- Mezmo Active Telemetry Pipeline: Deduplicate, cluster, enrich before agents see data. Up to 99.98% compression — every removed token saves inference cost.
- Mezmo Agent-optimized MCP server: Returns curated, task-scoped data — not raw firehose. Other vendors' MCP tools are blunt instruments. Mezmo's are surgical.
- Both Just-in-time context delivery: Each workflow step gets precisely scoped data. Dynamic assembly as investigations unfold — not a dump of everything.
- AURA Plan → Execute → Synthesize → Evaluate loop: Built-in self-checks catch mismatches mid-investigation and trigger replanning.
AURA + Mezmo MCP (curated context)
[mcp.servers.mezmo]
transport = "http_streamable"
url = "http://mezmo-mcp:8081/mcp"
headers = {
"Auth" = "Bearer {{ env.MEZMO_TOKEN }}"
}
# Mezmo MCP returns curated signals,
# not raw API firehose.
# ~$1/investigation vs $25-36 elsewhereWith vs without Mezmo
# WITHOUT Mezmo (raw vendor MCP)
# → 2.4M tokens per investigation
# → 88% noise in context window
# → $30-36 per investigation
# → 14+ min MTTR
# WITH Mezmo pipeline + MCP
# → <1K curated signals
# → noise removed before agent sees it
# → <$1 per investigation
# → <5 min MTTR4. Control your data
Consolidate flows, adopt OTel, reduce cost, migrate off legacy systems. Shift from reactive firefighting to agents that prevent incidents.
- Mezmo Native OTel ingestion and routing: Migrate incrementally — dual-write to old and new backends during transition. No rip-and-replace.
- Mezmo Vendor-agnostic telemetry routing to Mezmo, Datadog, Grafana, Elastic, S3: AURA agents work against any of them via MCP.
- Mezmo Cost profiling: Identify high-volume, low-value streams. Cut observability spend up to 70%.
- Both Proactive anomaly detection: Continuous monitoring for degraded signals and drift. Surface issues before they become incidents.
- Both Multi-agent orchestration: Specialized workers coordinated by an orchestrator for complex investigations.
- Mezmo Compounding system memory: Incident patterns stored and reused. The 100th occurrence costs a fraction of the first.
Dual-write during migration
# Mezmo pipeline routes telemetry
# to multiple backends simultaneously
# Old backend (keep during transition)
# Datadog → existing dashboards still work
# New backend (AURA agents read from here)
# Mezmo → curated, agent-optimized data
# When ready: cut over, decommission old.
# No big-bang migration required.LLM failover (one-line swap)
# Primary provider
[llm]
provider = "anthropic"
model = "claude-sonnet-4-20250514"
# Provider down? Swap in 10 seconds:
# provider = "openai"
# model = "gpt-4o"
# Or go local:
# provider = "ollama"
# model = "llama3.1:latest"
# Tool integrations don't break.
# AURA sanitizes schemas per provider.Get to production fast. Start wherever you are.
Add Mezmo when you're ready for curated context.
