RELATED ARTICLES
SHARE ARTICLE
Telemetry Tracing: Best Practices & Use Cases
Learning Objectives
This learn article dives deep into the key aspects of telemetry and OpenTelemetry. It covers definitions for traces and spans, offers best practices for OpenTelemetry tracing, and shares an example on how to set one up.
Telemetry Tracing: Best Practices & Use Cases
What is telemetry tracing?
Telemetry tracing - often referred to as distributed tracing - is a method for tracking and visualizing the journey of a request or transaction as it flows through the components of a distributed system. It provides end-to-end visibility into system behavior, performance, and dependencies.
Telemetry tracing is made up of traces, spans, context propagation and instrumentation. A trace represents the full lifecycle of a request as it moves through various services/components in a system. Context propagation is what happens as requests travel through different services - the tracing context (like trace ID and span ID) is passed along to maintain linkage between operations. And instrumentation involves inserting code or using libraries (e.g., OpenTelemetry) to collect trace data.
Telemetry tracing can be used to diagnose bottlenecks across microservices, identify latency sources in request paths, understand system dependencies and failure points, correlate traces with logs and metrics for full observability, and improve performance optimization and incident response
What is OpenTelemetry?
OpenTelemetry (often abbreviated as OTel) is an open-source observability framework designed to collect, generate, and export telemetry data (metrics, logs, and traces) from applications and infrastructure. It provides vendor-neutral, standardized instrumentation so developers and operators can understand system behavior and performance across distributed systems.
OTel has a number of key components including:
- APIs
Provides a language-specific interface for creating telemetry data (traces, metrics, logs). - SDKs
Offers the implementation for the API, including sampling, batching, and exporting. - Instrumentation Libraries
Prebuilt or custom libraries that auto-instrument common frameworks (HTTP, gRPC, database clients). - Collectors
The OpenTelemetry Collector is a vendor-agnostic agent/service that receives, processes, and exports telemetry data to backends like Jaeger, Prometheus, Mezmo, etc. - Exporters
Translate telemetry data into formats compatible with external observability platforms (OTLP, Zipkin, or Prometheus formats).
OTel primarily handles three types of data including traces (which track request paths across services), metrics (which measure system behavior), and logs (which capture structured or unstructured application/system events). OTel works in three steps: first, code is instrumented using OTel libraries or auto-instrumentation; then, traces, metrics, and logs are collected at runtime; and finally, collected data is sent to an observability backend via the OpenTelemetry Collector or exporters.
Organizations report a wide variety of benefits from adoption of OTel including a unified standard for all telemetry types, a vendor-neutral, open-source governance, wide ecosystem support, a reduction in vendor lock-in, and support for both manual and automatic instrumentation.
Traces: Definitions
TracerProvider
A TracerProvider in OpenTelemetry is the central component responsible for creating and managing tracers, which in turn generate and record spans (units of work in a trace). It acts as the entry point to tracing within an application.
In the OpenTelemetry tracing pipeline, the TracerProvider is the top-level object that configures tracing. The tracer is created by the TracerProvider; and is used in code to start spans. And the span represents an individual operation or step in a trace.
A tracer provider creates tracers, controls span processors and exporters, manages configuration and routes spans to the right backends or collectors.
Tracer
In OpenTelemetry, a Tracer is the component used to create spans, which are the individual units of work in a distributed trace. It's the primary interface developers use in application code to generate and record tracing data.
The Tracer starts and ends spans, connects spans into traces, attaches attributes, events, and status to spans, and helps instrument code to observe distributed workflows. Tracers are critical because they enable distributed tracing, provide context for telemetry data, improve debugging, and support root cause analysis.
Tracer vs. Tracer Provider
Trace Exporters
Trace exporters in OpenTelemetry are components responsible for sending collected trace data (spans) to an external backend system or observability platform for storage, visualization, and analysis.
They act as the final step in the telemetry pipeline, after spans are created and processed, exporters send them out to observability tools or custom destinations.
A trace exporter converts span data into the correct format for the destination, transmits spans to external systems via protocols, and works with Span Processors to handle batch delivery, retries, and error handling.
Teams find trace exporters useful for a number of reasons. They decouple instrumentation from backends, support multiple observability platforms, enable centralized tracing and analysis, optimize performance through batching and asynchronous export, and facilitate vendor flexibility and observability portability.
Context Propagation
Context propagation in OpenTelemetry is the mechanism that allows telemetry data - especially trace context - to be passed across service boundaries and asynchronous operations, so that spans can be linked into a complete distributed trace.
In a distributed system, a request may travel through many services. Each service creates a span, but without context propagation, those spans would appear as unrelated traces. With context propagation, all spans can be connected into one coherent trace, showing the full lifecycle of a request.
Four items are typically propagated: the trace ID, the span ID, sampling decisions and sometimes baggage.
The process of context propagation kicks off with a tracer starting a span and attaching its context to the current execution thread. Then the context is injected into headers before making a remote call. In the final step, the receiving service extracts the context from incoming headers and uses it to create a child span linked to the original trace.
Context propagation has a number of benefits. It enables end-to-end trace visibility across distributed systems, maintains parent-child relationships between spans, supports both synchronous and asynchronous operations, and facilitates root cause analysis and latency breakdown.
Spans: Definitions
Span Context
In OpenTelemetry, a span context is a lightweight, immutable object that carries the identity and metadata of a span, allowing it to be linked to other spans and propagated across services or threads. It is crucial for enabling distributed tracing and context propagation.
A SpanContext includes the following key fields:
A span context links spans together in a trace (parent-child relationships), carries tracing information across service boundaries, and enables context propagation so all spans stay part of the same trace. Span context is important because it maintains trace continuity across services and threads, enables correlation of spans into a coherent trace, powers trace exporters and visualization tools, and allows tools to apply sampling and filtering decisions.
Span Attributes
Span attributes in OpenTelemetry are key-value pairs attached to a span to provide additional context and metadata about the operation it represents. They help describe what happened, where it happened, and how, making trace data more meaningful, searchable, and actionable.
Span attributes describe details like the HTTP request method, database query, user ID, cloud region, or hostname. This metadata helps filter and search traces, group spans by common tags, and diagnose performance issues and trace root causes. Span attributes add context to each span, while enabling fine-grained filtering in observability tools. Span attributes power dashboards, alerts, and analysis so teams can improve troubleshooting and root cause identification.
Span Events
Span events in OpenTelemetry are timestamped annotations added to a span to represent notable moments or intermediate steps during the span's execution. They help enrich span data with in-line context about what happened within the span’s lifetime, without creating separate spans.
A Span event consists of a name (usually a short label for the event), a timestamp, and sometimes attributes. Teams use Span events to mark significant occurrences like errors or exceptions, retries or fallbacks, time of external calls, or state transitions. Span events provide granular detail without needing additional spans and help with debugging, performance profiling, and understanding behavior.
Span Events vs. Spans
When to use Span Events vs. Span Attributes
Choosing between span events and span attributes depends on what you're capturing and when it happens during the execution of a span.
Here's a detailed comparison to help you decide:
Use Span Attributes When:
Use Span Events When:
Overall, use attributes for describing the span and use events for what happened during the span.
Span Links
Span links in OpenTelemetry are references to other spans that are related to the current span but are not its direct parent. They allow you to connect spans across traces or branches that are logically related but do not follow the traditional parent-child hierarchy.
A span link contains a reference to a SpanContext,optional attributes describing the relationship, and no timing or causal relationship like parent-child spans. Span links are useful when a span has multiple parents or depends on multiple inputs or it is important to preserve context across asynchronous or concurrent operations. Also, span links are important when you’re sampling traces but want to maintain relationships with unsampled spans.
Span Status
In OpenTelemetry, a span status indicates the outcome of the operation represented by a span—whether it was successful, failed, or encountered an error. It is a semantic signal used to describe how the operation ended, which is essential for troubleshooting, alerting, and trace analysis.
A span’s status consists of two parts:
Use Span status for error tracking, alerting, performance debugging and filtering.
Span kind
In OpenTelemetry, SpanKind specifies the role a span plays in a distributed system interaction—such as whether it represents a client request, a server response, or an internal operation. It helps observability tools interpret the meaning
By setting the correct SpanKind, you provide semantic meaning about how the span participates in a system’s architecture. This is critical for trace correlation across services, accurate dependency mapping, and meaningful visualization in observability tools.
Client
The client makes the outbound remote call. A client is typically a Parent Span Kind, while a server is a Child Span Kind, and the use case example would be “service A calls service B.”
Server
The server is handling an inbound request and is usually a Child Span Kind.
Internal
An internal Span Kind is used for local operations or default operations.
Producer
A Producer Span Kind is sending a message to a queue in an outbound direction. It may have a “parent” relationship with a “child” Consumer Span Kind.
Consumer
A Consumer Span Kind is receiving or processing an inbound message.
Best practices for OpenTelemetry Tracing
Data consistency
Data consistency is a critical best practice in OpenTelemetry tracing because it ensures that trace data collected from distributed systems is reliable, coherent, and useful for performance monitoring, debugging, and root cause analysis.
Inconsistent data - such as mismatched trace IDs, incorrect span kinds, or irregular attribute naming - can break trace continuity, mislead analysis, or make it impossible to correlate telemetry across services.
Data consistency refers to the uniformity and correctness of the tracing data across spans, services, and systems. It covers trace structure consistency, attribute and naming conventions, context propagation integrity, time and clock synchronization, and span semantic correctness.
To ensure data consistency in OTel tracing, experts recommend seven best practices.
1. Maintain consistent trace context across services
2. Follow semantic conventions for attributes
3. Use span kinds accurately
4. Synchronize timestamps
5. Apply consistent naming for spans and services
6. Ensure sampling decisions are honored across services
7. Keep span status accurate
Attribution Selection
Attribution selection in OpenTelemetry tracing refers to the intentional and consistent choice of attributes (i.e., key-value metadata) that are attached to spans to describe the who, what, where, and why of an operation.
Get the most out of OTel tracing by following these six best practices for attribution selection:
1. Use semantic conventions
2. Avoid high-cardinality attributes
3. Include business-relevant metadata
4. Limit attribute volume per span
5. Tag spans with environment context
6. Ensure consistency across services
Naming Conventions
In OpenTelemetry tracing, naming conventions are a critical best practice that ensure your telemetry data is consistent, interpretable, and useful across all teams, services, and tools.
Well-defined naming conventions apply to span names, attribute keys, service names, and instrumentation libraries.
By following these five naming best practices, you make trace data easier to search, visualize, analyze, and correlate across distributed systems.
1. Use clear, consistent span names
2. Follow standardized attribute keys
3. Standardize service names
4. Use consistent naming for custom attributes
5. Include versioning where relevant
Context Propagation
Context propagation is a foundational best practice in OpenTelemetry tracing that ensures trace data remains coherent and connected as requests flow through distributed systems across services, threads, processes, and network boundaries.
Without consistent context propagation, your traces become fragmented, making it impossible to accurately reconstruct the end-to-end journey of a request.
Experts suggest following six steps to get the most out of context propagation.
1. Always inject and extract trace context
2. Use standard propagation formats
3. Propagate context across async and threaded work
4. Handle context in messaging systems
5. Use a global propagator across your application
6. Respect and continue incoming trace context
Resource Management
Batching and compression
The goal of batching and compression is to shrink outbound traffic and reduce CPU context switching. This is critical because network egress costs and per‑span export overhead can dwarf application work in large estates.
Best-practice checklist
Sampling
With sampling, teams want to control how many spans are actually stored or exported. This process removes noise, keeps retention affordable, and avoids UI overload.
Choosing a policy
Automatic and manual instrumentation
Teams need to balance coverage, accuracy, and engineering effort, which is why it’s important to have a strategy around automatic and manual instrumentation. Over‑instrumenting hurts performance; under‑instrumenting leaves blind spots.
Best‑practice blend
To sum up: Batch first, then compress to slash network chatter without losing data. Sample deterministically and consistently - edge or tail, but make the policy explicit and version‑controlled. Mix auto and manual instrumentation: auto for breadth, manual for depth. Provide shared libraries and CI checks so every service follows the same rules, otherwise “resource management” becomes “resource chaos.”
Security and Configuration
In OpenTelemetry tracing, Security and Configuration best practices are essential to ensure that your observability stack is safe, compliant, and performant. Since tracing data often includes sensitive information (user IDs, API keys, request headers, etc.), poor security or misconfiguration can lead to:
- Data leaks
- Regulatory violations (e.g., GDPR, HIPAA)
- Attack surface expansion
Secure Configuration
The key goals of security configuration are to prevent unauthorized access, protect data in transit and at rest, and ensure trace context can't be spoofed.
Suggested Best Practices:
Minimizing components
The key goals of minimizing components are to reduce the attack surface, simplify auditing and maintenance, and improve traceability and compliance. Every extra component adds operational overhead and potential vulnerabilities.
Suggested Best Practices:
Data Scrubbing
The key goals of data scrubbing are to avoid sending sensitive or PII data, maintain compliance, and reduce telemetry noise. Even a single exposed token or ID in a span can result in a serious breach or compliance violation.
Suggested Best Practices:
Collector Security
The key goals of collector security are to harden the OpenTelemetry Collector as a trusted system component and protect against data injection, misrouting, or unauthorized use.
Suggested Best Practices:
Error Handling
Error handling is a crucial best practice in OpenTelemetry tracing that ensures application errors are captured, classified, and traceable throughout a distributed system. Properly instrumented errors make it easier to:
- Identify failure points
- Diagnose root causes
- Improve system reliability
- Trigger alerts and observability workflows
Experts suggest following these six best practices:
1. Set span status explicitly on error
2. Capture and record exceptions as events
3. Use semantic attributes to enrich error context
4. Handle errors in both client and server spans
5. Don’t swallow or misclassify errors
6. Link logs to traces for full context
End Spans
Ending spans properly is a fundamental best practice in OpenTelemetry tracing that ensures spans accurately represent the lifecycle of operations and reflect correct timing, relationships, and resource usage. If you don’t explicitly end spans (or end them incorrectly), you risk broken traces, inaccurate metrics, and misleading observability data.
Suggested best practices:
1. Always end spans explicitly
2. End spans at the right time
3. Use context managers or try/finally blocks
4. Avoid ending a span multiple times
5. Ensure asynchronous work ends the original span
Choosing the right backend for stage and analysis
Choosing the right backend for staging and analysis is a strategic best practice in OpenTelemetry tracing. Your backend determines how traces are stored, queried, visualized, and acted upon, and the right choice depends on your use case, team maturity, cost constraints, and compliance requirements.
Select a backend that balances observability depth, scalability, and operational fit for both staging (test) and production (analysis) environments.
Suggested best practices:
1. Define your environment-specific goals
2. Understand backend types
3. Evaluate key criteria
4. Match tooling to team maturity
5. Use separate backends for stage vs. prod (optional)
A great example of a backend for telemetry tracing would be Mezmo. Mezmo (formerly LogDNA) leverages OpenTelemetry to enhance its observability platform. By integrating OpenTelemetry collectors and exporters, Mezmo enables users to ingest logs, metrics, and traces from across their infrastructure with minimal setup. This unified view of telemetry data empowers DevOps and SRE teams to diagnose issues faster, optimize performance, and ensure reliability.
OpenTelemetry collects and standardizes observability data when used together, while Mezmo ingestion, enrichment, and routes that data to optimize performance, cost, and insights.
Together, that leads to:
- An end-to-end observability pipeline: From source to destination with flexibility and control.
- Better incident response: Faster troubleshooting using structured and enriched logs and traces.
- Optimized telemetry costs: Collect broadly, route selectively, and store strategically.
- Enhanced developer workflows: Faster debugging and visibility without reinventing tooling.
Example on how to set up OpenTelemetry tracing
Here's a simple example to help you set up OpenTelemetry tracing in an application. We'll walk through the steps using Python, but the principles apply to any language supported by OpenTelemetry.
Step 1: Install Required Packages
bash
CopyEdit
pip install opentelemetry-api
pip install opentelemetry-sdk
pip install opentelemetry-exporter-otlp
pip install opentelemetry-instrumentation
pip install opentelemetry-instrumentation-requests
Step 2: Initialize OpenTelemetry Tracer
python
CopyEdit
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
# Step 1: Set the global tracer provider
trace.set_tracer_provider(TracerProvider())
# Step 2: Create an OTLP exporter (can point to collector or backend)
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces", insecure=True)
# Step 3: Configure batch processor
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)
# Step 4: Get a tracer
tracer = trace.get_tracer("my-service-name")
Step 3: Create and End a Span
python
CopyEdit
from opentelemetry.trace import Status, StatusCode
with tracer.start_as_current_span("process-order") as span:
try:
# Simulate work
result = "Order processed"
span.set_attribute("order.id", "1234")
span.set_status(Status(StatusCode.OK))
except Exception as e:
span.record_exception(e)
span.set_status(Status(StatusCode.ERROR, str(e)))
Step 4: Auto-Instrument Common Libraries (Optional)
python
CopyEdit
from opentelemetry.instrumentation.requests import RequestsInstrumentor
RequestsInstrumentor().instrument()
This automatically traces outbound HTTP calls made with requests.
Step 5: Run an OpenTelemetry Collector (Optional)
Use the OpenTelemetry Collector if you want to buffer, transform, or export to multiple backends:
Sample Collector Config:
yaml
CopyEdit
receivers:
otlp:
protocols:
http:
grpc:
exporters:
logging:
loglevel: debug
otlphttp:
endpoint: https://api.your-backend.com
compression: gzip
processors:
batch:
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [logging, otlphttp]
Run the collector with this config to act as an intermediary.
Output Example in Logs (with logging exporter)
json
CopyEdit
{
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_id": "00f067aa0ba902b7",
"name": "process-order",
"status": {
"code": "OK"
},
"attributes": {
"order.id": "1234"
}
}
Summing it up
Telemetry tracing is a complex but critical component of observability. Industry-approved best practices can make the process easier, as can the right choice of observability tool. Our best advice? Take it step-by-step!