5 Observability Metrics To Monitor In Logs

4 MIN READ
MIN READ

Which data sources do DevOps teams need in order to achieve observability? At a high level, that’s an easy question to answer. Concepts like the “three pillars of observability”—logs, metrics, and traces—may come to mind. Or, you may think in terms of techniques like the RED Method or Google’s Golden Signals, which are other popular frameworks for defining which types of data teams should collect for monitoring and observability purposes.

These concepts are great for helping to define an observability strategy at a high level. However, the problem with them is that they are not very specific about the precise types of data you need to enable observability, or where to find it. Which specific logs, traces, and metrics should you collect? Which parts of an application do you need to monitor to collect relevant request rate, error rate, and duration data?

To answer questions like these, you need to dive deeper into data sources for observability. This article does so by discussing five specific types of application-level insights that DevOps teams should collect as part of an observability strategy, along with tips on where to find the data and how to collect it from within a typical distributed application.

Total Application Requests

One of the most basic, but also most useful, metrics to track is the total number of requests that your application receives in a given period. Total requests reflect how much demand is placed on the app.

By correlating this data point with other metrics, like the time it takes to complete each request, teams can measure how application demand correlates with application performance. If you see a correlation between request rate and degradations in performance, for instance, it likely means that your application lacks sufficient resources to handle periods of peak demand.

Your method for tracking total application requests will depend on how your application is designed and deployed. In some cases, you could measure request rates through data exposed by your API gateway, if your application uses an API gateway to manage incoming requests. In other cases, application logs will record request rates, and you can use a logging agent to collect data from the logs and forward it to an external log analysis service.

Request Duration for Each Microservice

Request duration measures how long it takes to process a request.

You can measure request duration for an entire application. That would allow you to track how long it takes the application as a whole to handle each user request.

However, duration metrics tend to be more useful when you collect them at the level of microservices. After all, if your app is taking a long time to complete requests, the first question you’re likely to ask is which microservice (or microservices) is creating the bottleneck. In many cases, duration problems can be traced to a specific microservice that is taking longer than it should to do its part in handling a request.

There are two main ways to measure duration on a microservice-by-microservice basis. One is to collect this data from a service mesh, if you have one and it records information about request duration for each microservice. The other is to collect log data from the containers that host each microservice.

Microservice Instances Count

Knowing how many instances of each microservice you have running at a given time is useful for determining whether your application is under- or over-provisioned. It will also help you to determine whether a lack of available instances is responsible for performance problems that you may detect.

If you deploy each microservice in its own container, counting total microservices instances means counting how many containers you have running for each microservice. In most situations, the easiest way to get this data is from container orchestrator logs. Kubernetes audit logs could be configured to track container instances, for example.

Container Liveness and Readiness

Kubernetes uses the concepts of container “liveness” and “readiness” to assess whether containers are functioning properly. Containers that are not “live” or “ready” are typically not able to handle application requests.

Arguably, failure rates for liveness and readiness probes are an infrastructure metric rather than an application-level metric. However, because liveness and readiness problems are most often caused by an issue with the application, tracking these metrics is a very useful means of identifying application coding or configuration problems that are causing a container not to run properly.

(In some instances, liveness or readiness errors could be the result of a configuration issue with Kubernetes, not the app. But if other containers are working properly, chances are that the app is the source of the problem.)

To track readiness and liveness, you must first ensure that your Kubernetes pods are configured for liveness and readiness probes. (See the Kubernetes documentation for details.) Once they are, you can track the results of probes with a command like:

kubectl describe pod liveness-exec

CI/CD Pipeline Metrics

Last but not least on a list of DevOps observability metrics are metrics from the CI/CD pipeline. CI/CD pipeline metrics are metrics that measure the status and performance of CI/CD processes, such as how many new application releases you deploy per week or how many rollbacks you have to perform.

Here again, these aren’t technically application metrics, but they provide observability into the health of the application. A high rate of recurring rollbacks probably means that your application has a deep-seated coding problem that you should investigate, for example. Or, if you notice that application performance degrades as you increase your frequency of deployments, it may be a sign that you are trying to move too fast and are at risk of compromising application quality to maximize release velocity.

It’s easy to overlook the importance of CI/CD pipeline metrics, but they provide crucial context for observability that you can’t obtain from other parts of your stack.

Collecting CI/CD pipeline metrics can be tricky because logging and monitoring tools are not typically designed to track this category of data. However, it should be easy enough to use logs from your CI tools and/or release automation tools to track metrics like build frequency and deployment frequency.

Conclusion

Observability success requires defining the right types of data to collect and knowing where and how to collect it. From request rate for the application as a whole, to request duration for individual microservices, to CI/CD pipeline metrics and beyond, DevOps teams need a host of disparate data sources to achieve the benefits that observability stands to offer.


Table of Contents

    Share Article

    RSS Feed

    Next blog post
    You're viewing our latest blog post.
    Previous blog post
    You're viewing our oldest blog post.
    Mezmo + Catchpoint deliver observability SREs can rely on
    Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)
    What is Active Telemetry
    Launching an agentic SRE for root cause analysis
    Paving the way for a new era: Mezmo's Active Telemetry
    The Answer to SRE Agent Failures: Context Engineering
    Empowering an MCP server with a telemetry pipeline
    The Debugging Bottleneck: A Manual Log-Sifting Expedition
    The Smartest Member of Your Developer Ecosystem: Introducing the Mezmo MCP Server
    Your New AI Assistant for a Smarter Workflow
    The Observability Problem Isn't Data Volume Anymore—It's Context
    Beyond the Pipeline: Data Isn't Oil, It's Power.
    The Platform Engineer's Playbook: Mastering OpenTelemetry & Compliance with Mezmo and Dynatrace
    From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace
    Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility
    Architecting for Value: A Playbook for Sustainable Observability
    How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines
    Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo
    Introducing the New Mezmo Product Homepage
    The Inconvenient Truth About AI Ethics in Observability
    Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)
    Do you Grok It?
    Top Five Reasons Telemetry Pipelines Should Be on Every Engineer’s Radar
    Is It a Cup or a Pot? Helping You Pinpoint the Problem—and Sleep Through the Night
    Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos
    Why Datadog Falls Short for Log Management and What to Do Instead
    Telemetry for Modern Apps: Reducing MTTR with Smarter Signals
    Transforming Observability: Simpler, Smarter, and More Affordable Data Control
    Datadog: The Good, The Bad, The Costly
    Mezmo Recognized with 25 G2 Awards for Spring 2025
    Reducing Telemetry Toil with Rapid Pipelining
    Cut Costs, Not Insights:   A Practical Guide to Telemetry Data Optimization
    Webinar Recap: Telemetry Pipeline 101
    Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit
    2024 Recap - Highlights of Mezmo’s product enhancements
    My Favorite Observability and DevOps Articles of 2024
    AWS re:Invent ‘24: Generative AI Observability, Platform Engineering, and 99.9995% Availability
    From Gartner IOCS 2024 Conference: AI, Observability Data, and Telemetry Pipelines
    Our team’s learnings from Kubecon: Use Exemplars, Configuring OTel, and OTTL cookbook
    How Mezmo Uses a Telemetry Pipeline to Handle Metrics, Part II
    Webinar Recap: 2024 DORA Report: Accelerate State of DevOps
    Kubecon ‘24 recap: Patent Trolls, OTel Lessons at Scale, and Principle Platform Abstractions
    Announcing Mezmo Flow: Build a Telemetry Pipeline in 15 minutes
    Key Takeaways from the 2024 DORA Report
    Webinar Recap | Telemetry Data Management: Tales from the Trenches
    What are SLOs/SLIs/SLAs?
    Webinar Recap | Next Gen Log Management: Maximize Log Value with Telemetry Pipelines
    Creating In-Stream Alerts for Telemetry Data
    Creating Re-Usable Components for Telemetry Pipelines
    Optimizing Data for Service Management Objective Monitoring
    More Value From Your Logs: Next Generation Log Management from Mezmo
    A Day in the Life of a Mezmo SRE
    Webinar Recap: Applying a Data Engineering Approach to Telemetry Data
    Dogfooding at Mezmo: How we used telemetry pipeline to reduce data volume
    Unlocking Business Insights with Telemetry Pipelines
    Why Your Telemetry (Observability) Pipelines Need to be Responsive
    How Data Profiling Can Reduce Burnout
    Data Optimization Technique: Route Data to Specialized Processing Chains
    Data Privacy Takeaways from Gartner Security & Risk Summit
    Mastering Telemetry Pipelines: Driving Compliance and Data Optimization
    A Recap of Gartner Security and Risk Summit: GenAI, Augmented Cybersecurity, Burnout
    Why Telemetry Pipelines Should Be A Part Of Your Compliance Strategy
    Pipeline Module: Event to Metric
    Telemetry Data Compliance Module
    OpenTelemetry: The Key To Unified Telemetry Data
    Data optimization technique: convert events to metrics
    What’s New With Mezmo: In-stream Alerting
    How Mezmo Used Telemetry Pipeline to Handle Metrics
    Webinar Recap: Mastering Telemetry Pipelines - A DevOps Lifecycle Approach to Data Management
    Open-source Telemetry Pipelines: An Overview
    SRECon Recap: Product Reliability, Burn Out, and more
    Webinar Recap: How to Manage Telemetry Data with Confidence
    Webinar Recap: Myths and Realities in Telemetry Data Handling
    Using Vector to Build a Telemetry Pipeline Solution
    Managing Telemetry Data Overflow in Kubernetes with Resource Quotas and Limits
    How To Optimize Telemetry Pipelines For Better Observability and Security
    Gartner IOCS Conference Recap: Monitoring and Observing Environments with Telemetry Pipelines
    AWS re:Invent 2023 highlights: Observability at Stripe, Capital One, and McDonald’s
    Webinar Recap: Best Practices for Observability Pipelines
    Introducing Responsive Pipelines from Mezmo
    My First KubeCon - Tales of the K8’s community, DE&I, sustainability, and OTel
    Modernize Telemetry Pipeline Management with Mezmo Pipeline as Code
    How To Profile and Optimize Telemetry Data: A Deep Dive
    Kubernetes Telemetry Data Optimization in Five Steps with Mezmo
    Introducing Mezmo Edge: A Secure Approach To Telemetry Data
    Understand Kubernetes Telemetry Data Immediately With Mezmo’s Welcome Pipeline
    Unearthing Gold: Deriving Metrics from Logs with Mezmo Telemetry Pipeline
    Webinar Recap: The Single Pane of Glass Myth
    Empower Observability Engineers: Enhance Engineering With Mezmo
    Webinar Recap: How to Get More Out of Your Log Data
    Unraveling the Log Data Explosion: New Market Research Shows Trends and Challenges
    Webinar Recap: Unlocking the Full Value of Telemetry Data
    Data-Driven Decision Making: Leveraging Metrics and Logs-to-Metrics Processors
    How To Configure The Mezmo Telemetry Pipeline
    Supercharge Elasticsearch Observability With Telemetry Pipelines
    Enhancing Grafana Observability With Telemetry Pipelines
    Optimizing Your Splunk Experience with Telemetry Pipelines
    Webinar Recap: Unlocking Business Performance with Telemetry Data
    Enhancing Datadog Observability with Telemetry Pipelines
    Transforming Your Data With Telemetry Pipelines