What the Cloud Native Revolution Means for Log Management

4 MIN READ
MIN READ

This was originally posted on The New Stack.

Once upon a time, log management was relatively straightforward. The volume, types, and structures of logs were simple and manageable.

However, over the past few years, all of this simplicity has gone out the window. Thanks to the shift toward cloud native technologies—such as loosely coupled services, microservices architectures, and technologies like containers and Kubernetes—the log management strategies of the past no longer suffice. Managing logs successfully in a cloud native world requires fundamental changes to the way logs are aggregated, analyzed, and more.

Here’s how the cloud native revolution has changed the nature of log management and what IT and DevOps teams can do to continue managing logs effectively.

What Makes Cloud Native Logging Different

At first glance, log management in a cloud native world may not seem that different from conventional logging. Cloud native infrastructure and applications still generate logs, and the fundamental steps of the log management process—collection, aggregation, analysis, rotation—still apply.

But if you start trying to monitor a cloud native environment, it quickly becomes clear that managing logs efficiently and effectively is much more difficult. There are four main reasons why.

More logs

First and foremost, there are simply more logs to contend with.

Before the cloud native era, most applications were monoliths that ran on individual servers. Each application typically generated only one log (if it even created its own log at all; sometimes, applications logged data to syslog instead). Each server also typically generated only a handful of logs, with syslog and auth being the main ones. Thus, to manage logs for the entire environment, you only had a few logs to contend with.

In cloud native environments, by contrast, you typically work with microservices architectures - where there could be a dozen or more different services running, each providing a different piece of the functionality required to compose the entire application. Every microservice may generate its own log.

Not only that, but there are more layers of infrastructure; so by extension, more logs. You have not only the underlying host servers and the logs they generate, but also logs created by the abstraction layer—such as Docker or Kubernetes or both, depending on how you use them—that sits between the application and the underlying infrastructure.

In short, the shift to cloud native means that IT teams have gone from contending with a handful of separate logs for each application they support, to a dozen or more.

More types of logs

Not only are there more logs overall, but there are more types of logs. Instead of just having server logs and application logs, you have logs for your cloud infrastructure, logs for Kubernetes or Docker, authentication logs, logs for both Windows and Linux (because it’s more common now to be using both types of operating systems in the same shop), and more.

This variety adds complexity not only because there are more distinct types of log data to manage, but also because these various types of logs are often formatted in different ways. As a result, it is harder to parse all logs at once using regex matching or other types of generic queries.

Diverse logging architectures

Along with the increase in the number and types of logs, there is now more complexity and variation in the way log data is exposed within application environments.

Kubernetes is a prime example. Kubernetes provides some built-in functionality for collecting logs at the node level; the exact way that it does that collection depends on environment variables. For example, it logs to journald on systems with systemd installed but otherwise writes directly to .log files inside /var/log.

To make matters more complicated, Kubernetes has no native support for cluster-level logging - although, again, multiple approaches are possible. You could use a logging agent running on each Kubernetes node to generate log data for the cluster, or you could run a logging agent in a sidecar container. Alternatively, you could try to generate cluster-wide log data directly from the application, provided your cluster architecture and application make this practical.

The bottom line here is that there is considerable variability in the way logging architectures are set up, even within the same platforms. As a result, it has become more difficult in cloud native environments to devise a uniform log management process that works consistently across all of the applications or platforms it needs to support.

Non-persistent log storage

A final challenge in cloud native logging arises from the fact that some cloud native applications lack persistent data storage. Containers are the prime example.

When a container instance stops running, all data stored inside the container is permanently destroyed. Thus, if log data was stored inside the container (which it often is, by default), it will disappear along with the container. Because containers are ephemeral, with instances halting and being removed with new ones spinning up automatically, it’s not as if admins are asked whether they want to save log data before a container shuts down. It will just shut down and be removed, taking your log data with it unless you have moved that data somewhere else beforehand.

This transience may be okay if you only care about working with log data in real time. However, if you need to keep historical logs available for a certain period of time, losing log data when containers stop running is unacceptable.

Best Practices for Cloud Native Log Management

To respond to these challenges of logging in a cloud native world, teams can use the following guidelines.

Unify log collection and aggregation

With so many different types of log formats and architectures to support and remember, trying to manage the logs for each system separately is not feasible.

Instead, implement a unified, centralized log management solution that automatically collects data from all parts of your environment and aggregates it into a single location.

Adopt a flexible log management solution

Your log management tools and processes should be able to support any type of environment without you having to reconfigure the environment.

If you have, for example, one Kubernetes cluster that exposes log data in one way and a second cluster that logs in a different way, you should be able to collect and analyze logs from both clusters without having to change the way either cluster deals with logs. Likewise, if you have one application running on one public cloud and another one on a different cloud, you shouldn’t have to modify the default logging behavior of either cloud environment in order to manage its logs from a central location.

Collect logs in real time

One way to ensure that logs from environments without persistent storage don’t disappear is to collect log data in real time and aggregate it in an independent location. That way, log data is preserved in a persistent log manager as soon as it is born and will remain available even if the container shuts down.

This approach is preferable to trying to collect log data only at fixed periods from inside containers, which leaves you at risk of missing some logs if the containers shut down earlier than you expected.

Use custom log parsers

Instead of ignoring logs that are structured in ways that conventional analytics tools can’t support, take advantage of custom log parsers to work with data in any format. That way, you don’t risk missing out on important insights from non-standard logs.

Conclusion

Cloud native log management is fundamentally different from managing log data for conventional, monolithic applications. It’s not just that the scale of log data has increased (though it has), but also that there is much greater diversity when it comes to the way log data is recorded, structured, and exposed. Managing logs effectively in the face of these challenges requires a log management solution that fully centralizes and unifies log data from any and all systems that you support, while also providing the power to derive insights from non-standard log types.

This post is part of a larger series called Logging in the Age of DevOps: From Monolith to Microservices and Beyond. Download the full eBook here.

Table of Contents

    Share Article

    RSS Feed

    Next blog post
    You're viewing our latest blog post.
    Previous blog post
    You're viewing our oldest blog post.
    Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)
    What is Active Telemetry
    Launching an agentic SRE for root cause analysis
    Paving the way for a new era: Mezmo's Active Telemetry
    The Answer to SRE Agent Failures: Context Engineering
    Empowering an MCP server with a telemetry pipeline
    The Debugging Bottleneck: A Manual Log-Sifting Expedition
    The Smartest Member of Your Developer Ecosystem: Introducing the Mezmo MCP Server
    Your New AI Assistant for a Smarter Workflow
    The Observability Problem Isn't Data Volume Anymore—It's Context
    Beyond the Pipeline: Data Isn't Oil, It's Power.
    The Platform Engineer's Playbook: Mastering OpenTelemetry & Compliance with Mezmo and Dynatrace
    From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace
    Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility
    Architecting for Value: A Playbook for Sustainable Observability
    How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines
    Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo
    Introducing the New Mezmo Product Homepage
    The Inconvenient Truth About AI Ethics in Observability
    Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)
    Do you Grok It?
    Top Five Reasons Telemetry Pipelines Should Be on Every Engineer’s Radar
    Is It a Cup or a Pot? Helping You Pinpoint the Problem—and Sleep Through the Night
    Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos
    Why Datadog Falls Short for Log Management and What to Do Instead
    Telemetry for Modern Apps: Reducing MTTR with Smarter Signals
    Transforming Observability: Simpler, Smarter, and More Affordable Data Control
    Datadog: The Good, The Bad, The Costly
    Mezmo Recognized with 25 G2 Awards for Spring 2025
    Reducing Telemetry Toil with Rapid Pipelining
    Cut Costs, Not Insights:   A Practical Guide to Telemetry Data Optimization
    Webinar Recap: Telemetry Pipeline 101
    Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit
    2024 Recap - Highlights of Mezmo’s product enhancements
    My Favorite Observability and DevOps Articles of 2024
    AWS re:Invent ‘24: Generative AI Observability, Platform Engineering, and 99.9995% Availability
    From Gartner IOCS 2024 Conference: AI, Observability Data, and Telemetry Pipelines
    Our team’s learnings from Kubecon: Use Exemplars, Configuring OTel, and OTTL cookbook
    How Mezmo Uses a Telemetry Pipeline to Handle Metrics, Part II
    Webinar Recap: 2024 DORA Report: Accelerate State of DevOps
    Kubecon ‘24 recap: Patent Trolls, OTel Lessons at Scale, and Principle Platform Abstractions
    Announcing Mezmo Flow: Build a Telemetry Pipeline in 15 minutes
    Key Takeaways from the 2024 DORA Report
    Webinar Recap | Telemetry Data Management: Tales from the Trenches
    What are SLOs/SLIs/SLAs?
    Webinar Recap | Next Gen Log Management: Maximize Log Value with Telemetry Pipelines
    Creating In-Stream Alerts for Telemetry Data
    Creating Re-Usable Components for Telemetry Pipelines
    Optimizing Data for Service Management Objective Monitoring
    More Value From Your Logs: Next Generation Log Management from Mezmo
    A Day in the Life of a Mezmo SRE
    Webinar Recap: Applying a Data Engineering Approach to Telemetry Data
    Dogfooding at Mezmo: How we used telemetry pipeline to reduce data volume
    Unlocking Business Insights with Telemetry Pipelines
    Why Your Telemetry (Observability) Pipelines Need to be Responsive
    How Data Profiling Can Reduce Burnout
    Data Optimization Technique: Route Data to Specialized Processing Chains
    Data Privacy Takeaways from Gartner Security & Risk Summit
    Mastering Telemetry Pipelines: Driving Compliance and Data Optimization
    A Recap of Gartner Security and Risk Summit: GenAI, Augmented Cybersecurity, Burnout
    Why Telemetry Pipelines Should Be A Part Of Your Compliance Strategy
    Pipeline Module: Event to Metric
    Telemetry Data Compliance Module
    OpenTelemetry: The Key To Unified Telemetry Data
    Data optimization technique: convert events to metrics
    What’s New With Mezmo: In-stream Alerting
    How Mezmo Used Telemetry Pipeline to Handle Metrics
    Webinar Recap: Mastering Telemetry Pipelines - A DevOps Lifecycle Approach to Data Management
    Open-source Telemetry Pipelines: An Overview
    SRECon Recap: Product Reliability, Burn Out, and more
    Webinar Recap: How to Manage Telemetry Data with Confidence
    Webinar Recap: Myths and Realities in Telemetry Data Handling
    Using Vector to Build a Telemetry Pipeline Solution
    Managing Telemetry Data Overflow in Kubernetes with Resource Quotas and Limits
    How To Optimize Telemetry Pipelines For Better Observability and Security
    Gartner IOCS Conference Recap: Monitoring and Observing Environments with Telemetry Pipelines
    AWS re:Invent 2023 highlights: Observability at Stripe, Capital One, and McDonald’s
    Webinar Recap: Best Practices for Observability Pipelines
    Introducing Responsive Pipelines from Mezmo
    My First KubeCon - Tales of the K8’s community, DE&I, sustainability, and OTel
    Modernize Telemetry Pipeline Management with Mezmo Pipeline as Code
    How To Profile and Optimize Telemetry Data: A Deep Dive
    Kubernetes Telemetry Data Optimization in Five Steps with Mezmo
    Introducing Mezmo Edge: A Secure Approach To Telemetry Data
    Understand Kubernetes Telemetry Data Immediately With Mezmo’s Welcome Pipeline
    Unearthing Gold: Deriving Metrics from Logs with Mezmo Telemetry Pipeline
    Webinar Recap: The Single Pane of Glass Myth
    Empower Observability Engineers: Enhance Engineering With Mezmo
    Webinar Recap: How to Get More Out of Your Log Data
    Unraveling the Log Data Explosion: New Market Research Shows Trends and Challenges
    Webinar Recap: Unlocking the Full Value of Telemetry Data
    Data-Driven Decision Making: Leveraging Metrics and Logs-to-Metrics Processors
    How To Configure The Mezmo Telemetry Pipeline
    Supercharge Elasticsearch Observability With Telemetry Pipelines
    Enhancing Grafana Observability With Telemetry Pipelines
    Optimizing Your Splunk Experience with Telemetry Pipelines
    Webinar Recap: Unlocking Business Performance with Telemetry Data
    Enhancing Datadog Observability with Telemetry Pipelines
    Transforming Your Data With Telemetry Pipelines
    6 Steps to Implementing a Telemetry Pipeline