Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos

4 MIN READ

MIN READ

Log volume is exploding, costs are rising, and most teams are stuck duct-taping together short-term fixes.

During our webinar, "Optimizing Log Management in Datadog: Cut Costs Without Losing Insights," we discuss how DevOps and engineering leaders are navigating the growing pains of observability, especially in environments where tools like Datadog are mission-critical but challenging to manage.

Here’s a recap of the key takeaways:

Teams Are Drowning in Noise, Not Signal

One of the biggest pain points discussed is the signal-to-noise ratio. You get the alert, but what actually broke?

Finding the answer often involves bouncing between dashboards, parsing unnecessary logs, and hoping that someone has properly tagged the trace. The underlying issue is that your telemetry pipeline isn’t built to route the right data to the right destination in the right format.

Eventually, it becomes such a pain that it needs a whole cleanup initiative.

This line from the session resonated with many attendees.

Most teams don’t proactively manage what logs are sent or stored. Things are left as-is until cost or performance becomes so painful that a major audit is the only option. Those cleanup efforts usually involve:

Revisiting the codebase to understand what’s being logged
Pruning useless logs (if anyone can agree on what’s “useless”)
Rewriting exclusion rules to patch over deeper issues

Spoiler: No one enjoys doing this. And it rarely gets prioritized.

Exclusion rules and built-in controls are ‘too rigid’

Datadog and other observability tools offer built-in volume controls, such as sampling, exclusion rules, and rehydration.

These controls are too rigid. They help with blunt-force filtering but don’t provide the precision you need to make smarter decisions upstream. Teams end up logging everything “just in case,” then scramble to cut volume at the end of the pipeline after costs have already racked up.

AI Is Supercharging Telemetry Noise

Another sharp insight: AI is driving up log volume even faster than expected.

Every time you add AI-powered features to your app, you’re likely generating more logs—more inference results, more structured data, more metadata. It’s a firehose.

This trend is making it harder than ever to predict observability costs, especially if your telemetry pipeline can’t intelligently filter, sample, or route based on what’s actually valuable.

Smarter Routing and Tiering Help Control the Chaos

Not all logs need to be retained forever. However, many teams lack a strategy for determining what stays hot, what goes cold, and what gets dropped altogether.

The solution is to adopt smarter tiering and routing, keeping real-time data accessible for debugging, while archiving the long tail for audit and compliance purposes in a more cost-effective storage solution.

So, what is working?

While every team’s setup is different, there was clear agreement on what’s working:

Telemetry pipelines that sit between source and destination. These let teams filter, redact, sample, and enrich data before it hits expensive tools like Datadog.
Profiling data before routing it. Understanding what you’re sending—and why—helps cut volume without cutting visibility.
Separating critical vs. non-critical data paths. Not all data needs to hit your most expensive observability tier.

The bottom line

Logging everything “just in case” isn’t sustainable. Exclusion rules and rehydration help, but they’re no substitute for a real telemetry pipeline.

If your team is feeling the strain, whether it’s alert fatigue, sky-high bills, or just an overwhelming amount of noise, you’re not alone.

But you do have options. Smarter telemetry starts upstream.

‍

Watch the full webinar recording here.

Try Mezmo’s Telemetry Pipeline for Datadog

‍

Table of Contents

Share Article

RSS Feed