Logging In An Era Of Containers

4 MIN READ
MIN READ

Log analysis will always be fundamental to any log monitoring strategy. It is the closest you can get to the source of what’s happening with your app or infrastructure. As application development has undergone drastic change over the past few years with the rise of DevOps, containerization, and microservices architecture, logs have not become less important, rather they are now at the forefront of running stable, highly available, and performant applications. However, the practice of logging has changed. From being simple it is now complex, from just a few hundred lines of logs we now often see millions of lines of logs, from all being in one place we are now dealing with distributed log data. Yet as logging has become more challenging, a new breed of tools have arrived on the scene to manage and make sense of all this logging activity. In this post, we’ll look at the sea change that logging has undergone, and how innovative, cloud-based solutions, such as a container, have sprung up to address these challenges.

Complexity of the stack

Traditional client-server applications were simple to build, understand, and manage. The front end was required to run on a few browsers or operating systems. The backend consisted of a single consolidated database, or at most a couple of databases on a single server. When something goes wrong you can jump into your system logs at /VAR/LOG and easily identify the source of the failure and how to fix it. With today’s cloud-native apps, the application stack has become tremendously complex. Your apps need to run on numerous combinations of mobile devices, browsers, operating systems, edge devices, and enterprise platforms. Cloud computing has made it possible to deliver apps consistently across the world using the internet, but it comes with its own challenges of management. VMs (virtual machines) brought more flexibility and cost efficiencies over hardware servers, but organizations soon outgrew them and needed a faster way to deliver apps. Enter Docker. Containers bring consistency to the development pipeline by breaking down complex tasks and code into small manageable chunks. This fragmentation lets organizations ship software faster, but it requires you to manage a completely new set of components. Container registries, the container runtime, an orchestration tool or CaaS (containerization as a service) - all make a container stack more complex than VMs.

Volume of data has spiked

Each component generates its own set of logs. Monolithic apps are decomposed into microservices with each service being powered by numerous containers. These containers have short life spans of a few hours compared to VMs which typically run for months or even years. Every request is routed across the service mesh and touches many services before being returned as a response to the end user. As a result, the total volume of logs has multiplied. Correlating the logs in one part of the system with those of another part is difficult, and insights are hard won. Having more log data is an opportunity for better monitoring, but only if you’re able to glean insights out of the data efficiently.

Many logging mechanisms

Each layer has its own logging mechanism. For example, Docker has drivers for many log aggregators. Kubernetes, on the other hand, doesn’t support drivers. Instead it uses a Fluentd agent running on every node. More on Fluentd later in this post. Kubernetes doesn’t have native log storage, so you need to configure log storage externally. If you use a CaaS platform like ECS, they would have their own set of log data. ECS has its own logs collector. With log collection so fragmented, it can be dizzying to jump from one tool to another to make sense of errors when troubleshooting. Containers require you to unify logging from all the various components for the logs to be useful.

The rise of open source tools

As log data has become more complex the solutions for logging have matured as well. Today, there are many open source tools available. The most popular open source logging tool is the ELK stack. It’s actually a collection of three different open source tools - Elasticsearch, Logstash, and Kibana. Elasticsearch is a distributed full-text search database, Logstash is a log aggregation tool, and Kibana is a visualization tool for time-series data. It’s easy to get started with the ELK stack when you’re dipping your toes into container logging, and it packs a lot of powerful features like high availability, near-real-time analysis, and great visualizations. However, once your logs reach the limits of your physical nodes that power the ELK stack, it becomes challenging to maintain operations smoothly. Performance lags and resource consumption become an issue. Despite this, the ELK stack has sparked many other container logging solutions like Mezmo, formerly known as LogDNA. These solutions have found innovative ways to deal with the problems that weigh down the ELK stack. Fluentd is another tool commonly used along with the ELK stack. It is a log collection tool that manages the flow of log data from the source app to any log analysis platform. Its strength is that it has a wide range of plugins and can integrate with a wide variety of sources. However, in a Kubernetes setup, to send logs to Elasticsearch, Fluentd places an agent in every node, and so becomes a drain on system resources.

Machine learning is the future

While open source tools have led the way in making logging solutions available, they require a lot of maintenance overhead when monitoring real-world applications. Considering the complexity of the stack, volume of data, and various logging mechanisms, what’s needed is a modern log analysis platform that can intelligently analyze log data and derive insights. Analyzing log data by manual methods is a thing of the past. Instead machine learning is opening up possibilities to let algorithms do the heavy lifting of crunching log data and extracting meaningful outcomes. Because algorithms can spot minute anomalies that would be invisible to humans, they can identify threats much before a human would, and in doing so can help prevent outages even before they happen. Mezmo is one of the pioneers in this attempt to use machine learning to analyze log data. In conclusion, it is an exciting time to build and use log analysis solutions. The challenge is great, and the options are plenty. As you choose a logging solution for your organization, remember the differences between legacy applications and modern cloud-native ones, and choose a tool that supports the latter most comprehensively. And as you think about the future of log management, remember that the key words are ‘machine learning’.

Table of Contents

    Share Article

    RSS Feed

    Next blog post
    You're viewing our latest blog post.
    Previous blog post
    You're viewing our oldest blog post.
    Mezmo + Catchpoint deliver observability SREs can rely on
    Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)
    What is Active Telemetry
    Launching an agentic SRE for root cause analysis
    Paving the way for a new era: Mezmo's Active Telemetry
    The Answer to SRE Agent Failures: Context Engineering
    Empowering an MCP server with a telemetry pipeline
    The Debugging Bottleneck: A Manual Log-Sifting Expedition
    The Smartest Member of Your Developer Ecosystem: Introducing the Mezmo MCP Server
    Your New AI Assistant for a Smarter Workflow
    The Observability Problem Isn't Data Volume Anymore—It's Context
    Beyond the Pipeline: Data Isn't Oil, It's Power.
    The Platform Engineer's Playbook: Mastering OpenTelemetry & Compliance with Mezmo and Dynatrace
    From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace
    Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility
    Architecting for Value: A Playbook for Sustainable Observability
    How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines
    Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo
    Introducing the New Mezmo Product Homepage
    The Inconvenient Truth About AI Ethics in Observability
    Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)
    Do you Grok It?
    Top Five Reasons Telemetry Pipelines Should Be on Every Engineer’s Radar
    Is It a Cup or a Pot? Helping You Pinpoint the Problem—and Sleep Through the Night
    Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos
    Why Datadog Falls Short for Log Management and What to Do Instead
    Telemetry for Modern Apps: Reducing MTTR with Smarter Signals
    Transforming Observability: Simpler, Smarter, and More Affordable Data Control
    Datadog: The Good, The Bad, The Costly
    Mezmo Recognized with 25 G2 Awards for Spring 2025
    Reducing Telemetry Toil with Rapid Pipelining
    Cut Costs, Not Insights:   A Practical Guide to Telemetry Data Optimization
    Webinar Recap: Telemetry Pipeline 101
    Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit
    2024 Recap - Highlights of Mezmo’s product enhancements
    My Favorite Observability and DevOps Articles of 2024
    AWS re:Invent ‘24: Generative AI Observability, Platform Engineering, and 99.9995% Availability
    From Gartner IOCS 2024 Conference: AI, Observability Data, and Telemetry Pipelines
    Our team’s learnings from Kubecon: Use Exemplars, Configuring OTel, and OTTL cookbook
    How Mezmo Uses a Telemetry Pipeline to Handle Metrics, Part II
    Webinar Recap: 2024 DORA Report: Accelerate State of DevOps
    Kubecon ‘24 recap: Patent Trolls, OTel Lessons at Scale, and Principle Platform Abstractions
    Announcing Mezmo Flow: Build a Telemetry Pipeline in 15 minutes
    Key Takeaways from the 2024 DORA Report
    Webinar Recap | Telemetry Data Management: Tales from the Trenches
    What are SLOs/SLIs/SLAs?
    Webinar Recap | Next Gen Log Management: Maximize Log Value with Telemetry Pipelines
    Creating In-Stream Alerts for Telemetry Data
    Creating Re-Usable Components for Telemetry Pipelines
    Optimizing Data for Service Management Objective Monitoring
    More Value From Your Logs: Next Generation Log Management from Mezmo
    A Day in the Life of a Mezmo SRE
    Webinar Recap: Applying a Data Engineering Approach to Telemetry Data
    Dogfooding at Mezmo: How we used telemetry pipeline to reduce data volume
    Unlocking Business Insights with Telemetry Pipelines
    Why Your Telemetry (Observability) Pipelines Need to be Responsive
    How Data Profiling Can Reduce Burnout
    Data Optimization Technique: Route Data to Specialized Processing Chains
    Data Privacy Takeaways from Gartner Security & Risk Summit
    Mastering Telemetry Pipelines: Driving Compliance and Data Optimization
    A Recap of Gartner Security and Risk Summit: GenAI, Augmented Cybersecurity, Burnout
    Why Telemetry Pipelines Should Be A Part Of Your Compliance Strategy
    Pipeline Module: Event to Metric
    Telemetry Data Compliance Module
    OpenTelemetry: The Key To Unified Telemetry Data
    Data optimization technique: convert events to metrics
    What’s New With Mezmo: In-stream Alerting
    How Mezmo Used Telemetry Pipeline to Handle Metrics
    Webinar Recap: Mastering Telemetry Pipelines - A DevOps Lifecycle Approach to Data Management
    Open-source Telemetry Pipelines: An Overview
    SRECon Recap: Product Reliability, Burn Out, and more
    Webinar Recap: How to Manage Telemetry Data with Confidence
    Webinar Recap: Myths and Realities in Telemetry Data Handling
    Using Vector to Build a Telemetry Pipeline Solution
    Managing Telemetry Data Overflow in Kubernetes with Resource Quotas and Limits
    How To Optimize Telemetry Pipelines For Better Observability and Security
    Gartner IOCS Conference Recap: Monitoring and Observing Environments with Telemetry Pipelines
    AWS re:Invent 2023 highlights: Observability at Stripe, Capital One, and McDonald’s
    Webinar Recap: Best Practices for Observability Pipelines
    Introducing Responsive Pipelines from Mezmo
    My First KubeCon - Tales of the K8’s community, DE&I, sustainability, and OTel
    Modernize Telemetry Pipeline Management with Mezmo Pipeline as Code
    How To Profile and Optimize Telemetry Data: A Deep Dive
    Kubernetes Telemetry Data Optimization in Five Steps with Mezmo
    Introducing Mezmo Edge: A Secure Approach To Telemetry Data
    Understand Kubernetes Telemetry Data Immediately With Mezmo’s Welcome Pipeline
    Unearthing Gold: Deriving Metrics from Logs with Mezmo Telemetry Pipeline
    Webinar Recap: The Single Pane of Glass Myth
    Empower Observability Engineers: Enhance Engineering With Mezmo
    Webinar Recap: How to Get More Out of Your Log Data
    Unraveling the Log Data Explosion: New Market Research Shows Trends and Challenges
    Webinar Recap: Unlocking the Full Value of Telemetry Data
    Data-Driven Decision Making: Leveraging Metrics and Logs-to-Metrics Processors
    How To Configure The Mezmo Telemetry Pipeline
    Supercharge Elasticsearch Observability With Telemetry Pipelines
    Enhancing Grafana Observability With Telemetry Pipelines
    Optimizing Your Splunk Experience with Telemetry Pipelines
    Webinar Recap: Unlocking Business Performance with Telemetry Data
    Enhancing Datadog Observability with Telemetry Pipelines
    Transforming Your Data With Telemetry Pipelines