Logging Fundamentals 1

4 MIN READ
MIN READ

Being inside a company that lives and breathes logging, observability and DevOps intelligence, sometimes it takes a moment to step back and explain what we do to our friends and family.

The simplest way we explain what LogDNA solves for companies with IT systems and software is similar to a blackbox on a plane that keeps a record of the flight data and the cockpit voice recorder. In the event something goes wrong or a system fails, the logs are readily accessible, quickly searchable and problems can be rectified.

Beginners Series

In Logging Fundamentals 1, we will introduce the basics of logs including what they are, how they are created, and the role they play in developing and maintaining systems and applications. In this lesson, you will learn:

  • How logs are created, how they are structured, and the kinds of data they store
  • The role that logs play in maintaining, troubleshooting, and securing your infrastructure
  • The risks of not recording or retaining logs

Lesson 1: What are Logs?

Lesson 2: Why Should I Log?

Lesson 3: What are the risks of Not Logging?

 

Lesson 1: What are Logs?

 

A log is a timestamped record of an event that took place on a computer or device. A log can be related to almost any component running on a system, such as the operating system, services, and applications. Logs store detailed information about specific events, including:

  • A description of the event
  • The date and time that it occurred
  • Contextual information, such as the component, user(s), or process(es) that caused the event

Logs provide a significant amount of information about how your infrastructure is operating. They can provide diagnostic information, configuration details, performance measurements, and any errors that occurred.

What is a Log File?

A log file is a collection of logs stored on a device. Log files provide a persistent chronological record of logs, making them effective tools for analyzing, troubleshooting, and optimizing your systems and applications.

The process of writing logs to a log file is called logging. Software applications and services typically create their own log file(s) to avoid mixing logs with other applications. Most log files also store their logs in plain text so you can easily review the log history. Over time, this can lead to log files taking up a significant amount of disk space, but the benefit is the ability to quickly view application-specific logs in a single file.

Modern operating systems also include services that aggregate logs from other applications and services in a single location. For example syslog is a software server and log format installed on most Linux systems. Applications can send their logs to a syslog server, which writes the logs to a common destination such as a file or other syslog server. The server then formats each log into a uniform structure, while preserving the log's original contents.

For example, the following log was generated by the systemd service on an Ubuntu server:

Mar 22 14:56:21 node1 systemd[1]: Starting Cleanup of Temporary Directories...

Let's break this log down into its individual fields:

  • Mar 22 14:56:21: the date and time of the event
  • node1: the name of the device that this event occurred on
  • systemd[1]: the name (and ID) of the process that created the log
  • Starting Cleanup of Temporary Directories...: the event's actual message

The syslog format includes additional contextual information about each log, such as its importance (severity level) and category (facility). In a future lesson, we'll explain these fields in more detail and how to define a custom log format.

 

Lesson 2: Why Should I Log?

 

Logs provide a wealth of information about operations, security incidents, user activity, errors and warnings, and countless other events. They are one of your most important assets when it comes to troubleshooting problems, identifying changes and trends, and detecting suspicious or anomalous activity across your infrastructure.

Troubleshooting and Root Cause Analysis

Eventually, a problem will occur within your infrastructure. When something does go wrong, you need to be able to determine:

  • What the problem is
  • Where it occurred
  • How to fix it
  • How to prevent it from recurring

Logs can provide all of this information. For one, logs often contain a description of the event along with the name of the host, process, or application that the problem occurred in. Application logging frameworks such as Log4Net (.NET), Winston (Node.js), and Log4j (Java) can provide even more granular information, down to the line of code that caused the error. This saves you from having to track down the source of the error, while also providing a complete history of events leading up to the error. Having this amount of contextual information can help you identify the true source of the problem and implement a more effective solution.

In addition, each log file tells an ongoing story about the activities that took place leading up to the problem. With enough log data, you can follow the trail of an error over the course of an hour, a day, a week, or even longer. You can also use this long-term operational data to set baseline expectations for how your systems behave. This can help you catch sudden changes and deviations faster.

Centralization

Modern IT infrastructures are becoming increasingly distributed, especially as more teams adopt cloud computing. When you have applications running in different data centers or on completely different platforms, overseeing them can quickly become a problem.

Fortunately, logs are portable. Not only can you copy log files between hosts, but network-capable logging services such as syslog can send logs between hosts. A common strategy is to send logs from all of your infrastructure components to a dedicated log host where they can be viewed and managed from a single location. In essence, this is how log management solutions like LogDNA work. As software architectures become increasingly distributed and abstracted, the need for centralized logging will also increase.

Persistence

Another challenge of modern infrastructures is ephemerality. Platforms like AWS Lambda and Kubernetes are designed for applications that only run for as long as necessary to complete their task. Once they are done, the platform deletes the entire application instance. Lambda functions last for just a few seconds to an hour at the most, making manual oversight nearly impossible.

Logs allow you to record the execution flow of ephemeral workloads so you can audit them even after they've been deleted.

 

Lesson 3: What are the Risks of Not Logging?

 

As we discussed in the previous lesson, Why Should I Log, logs can help you monitor your systems, troubleshoot applications, and keep a record of your applications' executions. However, logs also play an important role in security, auditing, and compliance. Not keeping comprehensive logs can lead to the following problems.

Limited Security Oversight

Logs are often the first step in auditing security incidents. Logs are such a crucial part of application security that they have become a widely accepted best practice among application developers. OWASP, a non-profit organization that promotes secure application development, lists poor logging practices as one of their top 10 most critical web application security risks.

The consequences are real for all organizations. In a study by F5 Networks on data breaches, applications were the initial target for 53% of all data breaches between 2005 and 2017. This accounted for 47% of the nearly $3.29 billion in damages caused to organizations around the world.

Logging won't prevent these attacks, but it will help you detect and respond to them in a timely manner. Even basic security logs can help you track:

  • User-driven events such as logins and administrator actions
  • Security alerts and errors generated by applications
  • Operational logs that could identify application and system vulnerabilities

No Audit Trail

Logs are a historical record of events trailing from this moment back to the beginning of the log file. In general, logs are also considered immutable: once they've been written, they can't be changed. These two attributes of log files make them critical resources when auditing your systems.

For example, consider a log file that tracks all user-related security events on a computer. The file records login attempts, login successes, logouts, and login failures. Each log includes the name of the relevant user account, the host that the attempt occurred on, and a timestamp. With just a quick look through the logs, you can immediately determine:

  • Which users are currently logged in
  • The last time that a specific user logged in, and how long their session was active
  • How many failed login attempts occurred
  • How often a user forgets their password (indicated by the number of failed logins)

A common example is in detecting login attacks. Attackers often try to gain access to systems by randomly guessing different username and password combinations. On Linux systems, these types of events are stored in the /var/log/auth.log file. For example, these messages were generated after an attacker tried logging into an SSH server with the username admin:

Mar 27 15:09:10 client sshd[10800]: Invalid user admin from [attacker IP address] port 33239

Mar 27 15:09:10 client sshd[10800]: input_userauth_request: invalid user admin [preauth]

Mar 27 15:09:11 client sshd[10800]: Connection closed by [attacker IP address] port 33239 [preauth]

These three messages tell us when the event occurred, where the attack originated, what the result was (fortunately, the attacker was denied access), and whether this was a one-off or recurring attack. We can use this to develop strategies such as blocking the attacker's IP address, or stopping the SSH service if we're not using it.

Compliance

Certain laws and regulations—such as PCI DSS and HIPAA—have strict requirements for logging electronic systems. Your organization might be required to log certain kinds of information, retain these logs for a certain amount of time, and periodically audit these logs to ensure that your systems remain in compliance. This can include:

Not complying with certain regulations can result in hefty fines and even criminal penalties. At the same time, different regulations will have different requirements for logging. If you or your organization are bound by certain regulations, you should check their requirements before developing a logging strategy.

Table of Contents

    Share Article

    RSS Feed

    Next blog post
    You're viewing our latest blog post.
    Previous blog post
    You're viewing our oldest blog post.
    What is Active Telemetry
    Launching an agentic SRE for root cause analysis
    Paving the way for a new era: Mezmo's Active Telemetry
    The Answer to SRE Agent Failures: Context Engineering
    Empowering an MCP server with a telemetry pipeline
    The Debugging Bottleneck: A Manual Log-Sifting Expedition
    The Smartest Member of Your Developer Ecosystem: Introducing the Mezmo MCP Server
    Your New AI Assistant for a Smarter Workflow
    The Observability Problem Isn't Data Volume Anymore—It's Context
    Beyond the Pipeline: Data Isn't Oil, It's Power.
    The Platform Engineer's Playbook: Mastering OpenTelemetry & Compliance with Mezmo and Dynatrace
    From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace
    Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility
    Architecting for Value: A Playbook for Sustainable Observability
    How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines
    Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo
    Introducing the New Mezmo Product Homepage
    The Inconvenient Truth About AI Ethics in Observability
    Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)
    Do you Grok It?
    Top Five Reasons Telemetry Pipelines Should Be on Every Engineer’s Radar
    Is It a Cup or a Pot? Helping You Pinpoint the Problem—and Sleep Through the Night
    Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos
    Why Datadog Falls Short for Log Management and What to Do Instead
    Telemetry for Modern Apps: Reducing MTTR with Smarter Signals
    Transforming Observability: Simpler, Smarter, and More Affordable Data Control
    Datadog: The Good, The Bad, The Costly
    Mezmo Recognized with 25 G2 Awards for Spring 2025
    Reducing Telemetry Toil with Rapid Pipelining
    Cut Costs, Not Insights:   A Practical Guide to Telemetry Data Optimization
    Webinar Recap: Telemetry Pipeline 101
    Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit
    2024 Recap - Highlights of Mezmo’s product enhancements
    My Favorite Observability and DevOps Articles of 2024
    AWS re:Invent ‘24: Generative AI Observability, Platform Engineering, and 99.9995% Availability
    From Gartner IOCS 2024 Conference: AI, Observability Data, and Telemetry Pipelines
    Our team’s learnings from Kubecon: Use Exemplars, Configuring OTel, and OTTL cookbook
    How Mezmo Uses a Telemetry Pipeline to Handle Metrics, Part II
    Webinar Recap: 2024 DORA Report: Accelerate State of DevOps
    Kubecon ‘24 recap: Patent Trolls, OTel Lessons at Scale, and Principle Platform Abstractions
    Announcing Mezmo Flow: Build a Telemetry Pipeline in 15 minutes
    Key Takeaways from the 2024 DORA Report
    Webinar Recap | Telemetry Data Management: Tales from the Trenches
    What are SLOs/SLIs/SLAs?
    Webinar Recap | Next Gen Log Management: Maximize Log Value with Telemetry Pipelines
    Creating In-Stream Alerts for Telemetry Data
    Creating Re-Usable Components for Telemetry Pipelines
    Optimizing Data for Service Management Objective Monitoring
    More Value From Your Logs: Next Generation Log Management from Mezmo
    A Day in the Life of a Mezmo SRE
    Webinar Recap: Applying a Data Engineering Approach to Telemetry Data
    Dogfooding at Mezmo: How we used telemetry pipeline to reduce data volume
    Unlocking Business Insights with Telemetry Pipelines
    Why Your Telemetry (Observability) Pipelines Need to be Responsive
    How Data Profiling Can Reduce Burnout
    Data Optimization Technique: Route Data to Specialized Processing Chains
    Data Privacy Takeaways from Gartner Security & Risk Summit
    Mastering Telemetry Pipelines: Driving Compliance and Data Optimization
    A Recap of Gartner Security and Risk Summit: GenAI, Augmented Cybersecurity, Burnout
    Why Telemetry Pipelines Should Be A Part Of Your Compliance Strategy
    Pipeline Module: Event to Metric
    Telemetry Data Compliance Module
    OpenTelemetry: The Key To Unified Telemetry Data
    Data optimization technique: convert events to metrics
    What’s New With Mezmo: In-stream Alerting
    How Mezmo Used Telemetry Pipeline to Handle Metrics
    Webinar Recap: Mastering Telemetry Pipelines - A DevOps Lifecycle Approach to Data Management
    Open-source Telemetry Pipelines: An Overview
    SRECon Recap: Product Reliability, Burn Out, and more
    Webinar Recap: How to Manage Telemetry Data with Confidence
    Webinar Recap: Myths and Realities in Telemetry Data Handling
    Using Vector to Build a Telemetry Pipeline Solution
    Managing Telemetry Data Overflow in Kubernetes with Resource Quotas and Limits
    How To Optimize Telemetry Pipelines For Better Observability and Security
    Gartner IOCS Conference Recap: Monitoring and Observing Environments with Telemetry Pipelines
    AWS re:Invent 2023 highlights: Observability at Stripe, Capital One, and McDonald’s
    Webinar Recap: Best Practices for Observability Pipelines
    Introducing Responsive Pipelines from Mezmo
    My First KubeCon - Tales of the K8’s community, DE&I, sustainability, and OTel
    Modernize Telemetry Pipeline Management with Mezmo Pipeline as Code
    How To Profile and Optimize Telemetry Data: A Deep Dive
    Kubernetes Telemetry Data Optimization in Five Steps with Mezmo
    Introducing Mezmo Edge: A Secure Approach To Telemetry Data
    Understand Kubernetes Telemetry Data Immediately With Mezmo’s Welcome Pipeline
    Unearthing Gold: Deriving Metrics from Logs with Mezmo Telemetry Pipeline
    Webinar Recap: The Single Pane of Glass Myth
    Empower Observability Engineers: Enhance Engineering With Mezmo
    Webinar Recap: How to Get More Out of Your Log Data
    Unraveling the Log Data Explosion: New Market Research Shows Trends and Challenges
    Webinar Recap: Unlocking the Full Value of Telemetry Data
    Data-Driven Decision Making: Leveraging Metrics and Logs-to-Metrics Processors
    How To Configure The Mezmo Telemetry Pipeline
    Supercharge Elasticsearch Observability With Telemetry Pipelines
    Enhancing Grafana Observability With Telemetry Pipelines
    Optimizing Your Splunk Experience with Telemetry Pipelines
    Webinar Recap: Unlocking Business Performance with Telemetry Data
    Enhancing Datadog Observability with Telemetry Pipelines
    Transforming Your Data With Telemetry Pipelines
    6 Steps to Implementing a Telemetry Pipeline
    Webinar Recap: Taming Data Complexity at Scale