See how you can save 70% of the cost by reducing log volume and staying compliant.

A Guide to OpenTelemetry: Architecture, Logs, and Implementation Best Practices

Learning Objectives

What is OpenTelemetry?

OpenTelemetry is an open-source observability framework that provides a standardized way to collect, process, and export telemetry data from applications and infrastructure. It combines the efforts of the OpenTracing and OpenCensus projects under the Cloud Native Computing Foundation (CNCF) umbrella and aims to simplify observability across distributed systems. 

Also known as OTel, OpenTelemetry has a number of key features:

It’s Vendor-Neutral: Otel works with many backends (like Prometheus, Jaeger, Zipkin, Grafana, Datadog, etc.), so you’re not locked into a single vendor.

Users get unified telemetry: Instead of using different tools for traces, metrics, and logs, Otel brings them together under one specification.

There are many instrumentation choices available: It provides SDKs and auto-instrumentation for many programming languages (e.g., Java, Python, Go) to gather telemetry data.

OpenTelemetry has three basic components: a Collector, a service that receives, processes, and exports telemetry data; SDKs, language-specific libraries that collect telemetry data; and the OTLP Protocol, the standard protocol for transmitting telemetry data.

Companies typically use OTel to monitor distributed systems (like microservices), improve visibility into performance and behavior, and simplify their observability infrastructure.

What is telemetry data?

Telemetry data is the automatic recording and transmission of data from remote or distributed systems to a central location for monitoring and analysis. In the context of software systems, telemetry includes metrics, logs, and traces that help developers and operations teams understand the internal state and behavior of applications.

There are three main types of telemetry data.

Traces, also known as distributed tracing, show the flow of a request as it moves through different parts of a system. Traces can help identify bottlenecks or failures in complex workflows. 

Metrics are numerical data collected over time to monitor system health and performance. Examples of metrics are CPU usage, request counts, error rates, or latency. Metrics are ideal for creating dashboards and setting alerts.

Logs are textual records of system events. They are often used to debug issues or understand application behavior.

Telemetry data is a critical component of observability and monitoring for a number of reasons. First, telemetry helps troubleshooting, allowing teams to find where and why a problem occurred quickly. Telemetry also makes it easier to tune performance by identifying slow or inefficient parts of the system. Alerting of failures or unusual activity wouldn’t happen without telemetry data. And finally auditing is made far easier by having access to telemetry.

How does OpenTelemetry work?

OpenTelemetry operates by instrumenting code to collect telemetry data (metrics, logs, and traces). SDKs and exporters provided by OpenTelemetry then processes this data. Exporters send the data to observability backends like Prometheus, Jaeger, or commercial platforms like Mezmo. The framework supports automatic instrumentation for many popular libraries and frameworks, reducing the need for manual code changes.

Broadly speaking, there are five parts to implementing OTel.

1. Instrumentation

Teams can either manually instrument Otel code to trace operations, record metrics, or log structured data, or use Otel to automatically inject telemetry into supported libraries/frameworks (like HTTP clients, databases, etc.) without modifying your code.

2. SDKs

Otel provides language-specific SDKs (e.g., for Java, Python, Go, Node.js) that:

  • Create spans for tracing.
  • Record metrics.
  • Add attributes to spans/logs (e.g., user ID, query, etc.).
  • Handle context propagation across services or threads.

3. Context Propagation

When a request moves through multiple services (e.g., A → B → C), Otel uses trace context headers to pass along metadata (like trace ID and span ID), so you can reconstruct the full journey of a request across services.

4. Collector (Optional but Powerful)

The OpenTelemetry Collector is a standalone service that:

  • Receives telemetry data from your apps.
  • Processes it (e.g., adds metadata, filters, batches).
  • Exports it to backends 

Using the collector decouples your app from direct vendor integration.

5. Exporters

These send data to observability backends. Exporters are included in both SDKs and the Collector.

Who contributes to OpenTelemetry?

OpenTelemetry is a CNCF project with contributions from major technology companies, including Google, Microsoft, Amazon, Splunk, and many others. The project's community-driven nature ensures rapid development and widespread adoption, with over 500 contributors actively maintaining and enhancing the ecosystem.

What are the benefits of OpenTelemetry?

Observability and monitoring are critical to successful modern software development, but that doesn’t mean they’re necessarily easy to do. Enter open source OTel and its “observability for everyone” principles. OpenTelemetry significantly benefits developers, SREs, and organizations aiming to improve observability across distributed systems. Here’s a breakdown of its key advantages:

OTel is a unified standard for observability, offering one framework to collect traces, metrics, and logs. It reduces the need for multiple tools or custom integrations and simplifies observability setup across your stack.

The standard is vendor-neutral and open source. Otel is backed by the Cloud Native Computing Foundation (CNCF), meaning there is no vendor lock-in. Companies can export telemetry to any backend and are free to choose and switch observability platforms.

OTel offers broad language and platform support. It supports major languages like Java, Python, Go, .NET, JavaScript, and more. OpenTelemetry works across cloud, containerized, and on-premise environments, and auto-instrumentation is available for common libraries and frameworks.

Deep visibility into distributed systems is possible. OTel enables distributed tracing, helping you understand how requests flow across microservices. Now, teams can identify performance bottlenecks, latency, or failures in complex environments.

OpenTelemetry has rich telemetry for monitoring and alerting. Combine metrics (like request latency) with traces (request paths) and logs (event details) for context-rich debugging. It improves incident response time and root cause analysis.

Use the powerful Collector. The OpenTelemetry Collector acts as a central hub for telemetry. It enables filtering, enriching, batching, and exporting telemetry data. The Collector helps reduce overhead on application services.

OTel is extensible and customizable. Create custom instrumentation tailored to your business logic. The standard supports custom exporters, processors, and pipelines to fit your needs.

It helps drive SLOs and SLIs. Otel provides the data foundation for service level objectives (SLOs) and service level indicators (SLIs). This functionality is essential for SRE practices and high-availability goals.

In short: OpenTelemetry standardizes and simplifies observability, enabling better system understanding, faster debugging, and more flexibility in how you monitor your services.

What are the drawbacks of OpenTelemetry?

While OpenTelemetry (Otel) is powerful and widely adopted, it has drawbacks and limitations, especially for teams just getting started. Here's a breakdown of the key challenges:

OTel is complex to set up and configure, with many moving parts including the SDKs, Collector, exporters, receivers, processors, etc. The learning curve can be steep for new users, especially when customizing pipelines. It is also YAML-heavy - Collector config files can get complicated quickly.

OpenTelemetry isn’t fully mature - some SDKs or features are still marked as experimental or in development, especially for logs and certain metrics APIs.

While observability is a critical practice, it’s not without performance overhead. Instrumentation (especially auto-instrumentation) can introduce issues if not tuned carefully. Improper configuration can cause excessive telemetry output, bloating logs or traces, and increasing storage/ingestion costs.

Distributed tracing can be challenging to master. Fully understanding traces and context propagation requires a shift in thinking, especially for developers unfamiliar with distributed systems. It also requires thoughtful design to create meaningful spans and metrics.

OTel doesn’t eliminate the need for a backend: there isn’t a visualization or storage backend, and teams will still require one.

Not every language is supported with auto-instrumentation, meaning teams may still need to do manual instrumentation.

Practice makes perfect, particularly regarding metrics, logs, and traces. Although Otel aims to unify these data types, correlating them meaningfully (e.g., tying a log entry to a specific trace or span) requires thoughtful integration and context propagation.

What are the main components of OpenTelemetry?

OTel has several basic components, including:

APIs: Language-specific interfaces for recording telemetry data.

SDKs: Implementations of the APIs that handle data collection and export.

Instrumentation Libraries: Prebuilt code that adds observability to popular frameworks.

Collectors: Agents receiving, processing, and exporting telemetry data from multiple sources.

Exporters: Components that send data to observability backends.

Monitoring and Observability tools and OpenTelemetry

OpenTelemetry acts as a bridge between applications and monitoring platforms. By using standard protocols and data formats, it allows seamless integration with tools such as Prometheus (metrics), Jaeger (traces), Fluentd (logs), and commercial observability suites. This flexibility enables organizations to centralize their observability strategy without vendor lock-in.

Monitoring and observability tools are critical for understanding modern software systems' performance, reliability, and behavior, especially distributed microservices. OpenTelemetry (Otel) plays a foundational role in these ecosystems by acting as a data collection layer that integrates with various tools.

(A reminder that monitoring answers “Is it working?” — uses predefined metrics and thresholds to detect known issues, and observability answers “Why isn’t it working?” — provides deep insight using logs, metrics, and traces to understand system behavior, especially unknown-unknowns.)

Think of OpenTelemetry as the plumbing:

[ Your App ]

   ↓ Instrumentation (Otel SDKs / Auto-Instrumentation)

[ Telemetry Data: Traces | Metrics | Logs ]

   ↓

[ OpenTelemetry Collector ]

   ↓

[ Backend: ]

Otel collects and formats data consistently across languages and services. Teams can switch backends without re-instrumenting their apps.

Using OTel with monitoring tools has several benefits, including standardization, flexibility, efficiency, and scalability.

How Mezmo and OpenTelemetry deliver value & impact

Mezmo (formerly LogDNA) leverages OpenTelemetry to enhance its observability platform. By integrating OpenTelemetry collectors and exporters, Mezmo enables users to ingest logs, metrics, and traces from across their infrastructure with minimal setup. This unified view of telemetry data empowers DevOps and SRE teams to diagnose issues faster, optimize performance, and ensure reliability.

OpenTelemetry collects and standardizes observability data when used together, while Mezmo ingestion, enrichment, and routes that data to optimize performance, cost, and insights.

Together, that leads to:

  • An end-to-end observability pipeline: From source to destination with flexibility and control.
  • Better incident response: Faster troubleshooting using structured and enriched logs and traces.
  • Optimized telemetry costs: Collect broadly, route selectively, and store strategically.
  • Enhanced developer workflows: Faster debugging and visibility without reinventing tooling.

Telemetry data in action: Three case studies

Employment Hero: Scaling Microservices with Unified Logging

Employment Hero transitioned to a microservices architecture and faced challenges in managing logs across diverse services. By implementing Mezmo, they achieved unified log management, language-agnostic aggregation, and real-time visibility. This approach enabled Employment Hero to efficiently manage their microservices and improve system observability.

Better Mortgage: Accelerating Kubernetes Troubleshooting

Better Mortgage integrated Mezmo to enhance their Kubernetes observability, resulting in faster incident response and enhanced log analysis. This integration streamlined their DevOps processes and reduced downtime.

Sysdig: Achieving 80% Improvement in Log Data Access

Sysdig partnered with Mezmo to optimize their log management and realized an 80% improvement in the time it takes to access and use log data, as well as enhanced troubleshooting. This collaboration enhanced Sysdig's operational efficiency and service reliability. 

It’s time to let data charge