RELATED Lessons
SHARE ARTICLE
Log Metrics: What are they? How can they be used? What insights can be garnered at scale?
Learning Objectives
What are log based metrics?
Log-based metrics are quantitative data points that are derived from log data. Instead of relying solely on traditional metrics (like CPU usage or request latency from monitoring agents), log-based metrics extract structured insights from the often unstructured or semi-structured content of logs.
Log-based metrics are metrics created by parsing log events to count, aggregate, or track specific values over time. They help translate rich, detailed log data into concise, actionable numerical data that can be visualized and alerted on. Examples of log-based metrics include error or request rates, latency or user activity.
Log-based metrics offer a number of benefits including:
- Granularity: Metrics from logs can provide high-fidelity details, including edge cases missed by standard metrics.
- Historical retrospective: Generate metrics from archived logs to analyze past performance or incidents.
- Custom dimensions: Logs often contain rich context (e.g., user ID, region, transaction ID), allowing metric filtering or grouping by these fields.
- Flexible alerting: Helps define alerts on patterns in logs without writing complex monitoring rules.
What can log metrics allow you to do?
Log-based metrics can perform a range of powerful functions that enhance observability, monitoring, and incident response.
Real-Time Monitoring
Log metrics make it possible to monitor key performance indicators (KPIs) derived from logs , observe service health and behavior based on live application activity, and detect issues in near real-time, even for systems without built-in metric exporters.
Alerting on Critical Events
Log metrics are at the heart of alerting, letting teams know about anomalies, set threshold-based alerts, or detect operational or security issues without additional instrumentation.
Visualization of Trends
Teams can use log-based metrics to create dashboards to visualize log trends , graph historical log-based metrics to correlate changes over time, and aggregate log events by custom fields.
Data Enrichment and Custom Metrics
Log-based metrics make it easy to create custom metrics from any log field, group and filter metrics by structured log data, and use complex parsing and extraction to derive meaningful KPIs.
Retrospective Analysis
Log metrics support backfilling metrics from historical logs for incident investigation, as well as post-incident analysis to understand what went wrong.
Reduce Noise While Gaining Insight
With log-based metrics, teams can summarize large volumes of logs into simple metric signals, avoid alert fatigue, and improve signal-to-noise ratio for complex environments.
Supplement Traditional Metrics
Log-based metrics fill gaps where traditional infrastructure or application metrics aren’t available and can provide business-context metrics that standard telemetry doesn't.
What categories of log metrics are there?
Log-based metrics can be categorized based on what they measure or how they're used. Below are the main categories of log metrics, each serving different operational, performance, or business monitoring needs:
1. Error metrics track failures and exceptions.
2. Performance metrics measure application or system performance from log data.
3. Traffic and usage metrics capture usage patterns and request volume.
4. Availability and health metrics reflect service uptime and readiness.
5. Security metrics indicate possible threats or the health of the security posture.
6. Audit and compliance metrics support governance and compliance tracking.
7. Business or custom metrics are derived from domain-specific fields in application logs.
8. Infrastructure metrics (from logs) are extracted from logs when infrastructure metrics are unavailable.
Project-based log metrics
Project-based log metrics are log-derived metrics that are specific to a particular project, application, or business initiative. They are custom-built to monitor and evaluate project-specific performance, usage, reliability, or business outcomes based on log data generated by that project.
Log-bucket based metrics
Log-bucket-based metrics refer to metrics that are aggregated based on logs stored in a log bucket—a logical or physical container that groups logs by source, purpose, or retention policy. These metrics are typically derived from the logs within that bucket and provide summarized insights about log activity, usage, or contents.
What are the different types of metrics?
In observability and monitoring, metrics are numerical values collected over time to reflect the performance, health, or behavior of systems. These metrics can be categorized based on how they behave, what they represent, or where they come from.
System-defined metrics
System-defined metrics are metrics that are automatically collected and reported by the system, platform, or infrastructure itself—without requiring custom instrumentation. These metrics provide critical insights into the health, performance, and resource utilization of hardware, operating systems, and cloud environments.
User-defined metrics
User-defined metrics (also known as custom metrics) are metrics that are explicitly created by users or developers to track specific aspects of their applications, systems, or business logic that are not captured by system-defined metrics.
Sources of log-based metrics
The sources of log-based metrics are the log-producing systems, services, and infrastructure components whose logs can be parsed and transformed into meaningful numeric metrics. These logs contain structured or semi-structured data that, when analyzed, yield valuable metrics about system health, performance, usage, or security.
Here’s a round up of major categories of sources that generate logs suitable for deriving metrics:
Application logs are custom logs emitted by software applications, and include common metrics like API request count, error rate by endpoint, and response time.
Web server logs are logs from HTTP servers like Nginx, Apache, or IIS. Web server logs include metrics like requests per second, 4xx/5xx error rates, and response time buckets.
Infrastructure logs come from servers, containers, and VMs and can contain metrics like disk usage warnings, system uptime, and out-of-memory errors.
Security logs capture security events and incidents with metrics like failed login attempts, unauthorized access counts and malware detection events.
Network logs from routers, firewalls, load balancers, and switches include metrics such as packet drop rate, connection count by IP or port, and bandwidth usage over time.
Database logs from database engines showing queries, connections, and errors contain metrics like query count and duration, deadlock or failure rate, and connection pool usage.
Cloud platform logs generated by cloud infrastructure and services contain metrics such as API usage volume, errors in managed services, and identity and access events.
Container and orchestration logs from container runtimes and orchestrators typically have metrics that include container start/stop rate, pod failure count, and image pull errors.
Audit logs capture changes in configurations, permissions, or admin activity and include metrics with privileged actions per user and configuration change rate.
Business event logs capture domain-specific or business transactions with metrics containing orders placed per minute, payments failed vs. succeeded, and abandoned checkouts.
Data types of log-based metrics
Log-based metrics can have different data types depending on what is extracted from the logs and how the metric is used. These data types define how the metric values are stored, visualized, and analyzed in observability platforms.
There are several data types of log-based metrics including numeric, Boolean, categorical, histogram buckets, and derived metrics.
Numeric metrics have values that are numeric and measurable and can be used for counting events, timing values or volume metrics.
Categorical metrics are non-numeric fields used as labels or groupings for metrics and are used for grouping by status or endpoint, or for filtering dashboards or alerts by these labels.
Histogram buckets are discrete buckets of numeric values representing distribution of a variable. These can be used to visualize latency distribution or to create percentiles.
Derived metrics are computed metrics based on other metrics, and are often used to show error or success rates.
Counter
Counter log-based metrics are a type of log-derived metric that count the number of occurrences of specific events or log entries over time. They are monotonically increasing—meaning they only go up—unless reset manually or by system restart.
Distribution
Distribution log-based metrics are metrics derived from log data that capture the statistical distribution of a numerical value—rather than just a single value or count. These metrics allow you to analyze how values are spread over a range (e.g., response times, payload sizes, job durations), supporting advanced insights like percentiles, averages, min/max, and standard deviation.
Boolean
Boolean metrics are derived from logs that evaluate to true or false, often used in binary status checks or condition detection. They are used to help teams decide if an event occurred, or not, or if a feature is enabled.
Organizing your Log Metrics with Labels
Organizing your log-based metrics with labels (also called tags, dimensions, or attributes depending on the tool) is essential for effective observability, filtering, alerting, and dashboarding. Labels give context to your metrics by categorizing and grouping them based on structured log fields.
To organize log metrics with labels,first ensure the logs are structured. Use structured logging formats like JSON to ensure log fields can be easily parsed into labels. Then extract relevant fields, and use a log processing tool to extract fields to use as labels. Map the fields to the labels, and be sure to use consistent naming conventions such as lowercase with underscores, and avoiding spaces or special characters. Finally, don’t use high cardinality labels, meaning those with a huge number of unique values, as those can increase cost and reduce performance.
How Mezmo can help you extract insights at Scale
Mezmo (formerly LogDNA) helps extract insights at scale by providing a high-performance platform designed to ingest, parse, analyze, and visualize log data from across your systems—fast and efficiently. It excels in delivering observability, security, and operational intelligence through scalable log management.
Mezmo offers high speed log ingestion at scale, ingesting millions of logs per second with minimal latency so it’s possible to process and analyze log data from hundreds or thousands of services without performance degradation.
With Mezmo, teams can use structured filters to narrow down to specific services, environments, users, or error codes, and automatically or manually extract fields from log lines for deeper analysis and log-based metric creation. Even with massive log volumes, teams can isolate and analyze only what matters to them.
Transform logs into metrics, attach labels, and trigger alerts - Mezmo lets you monitor trends and define KPIs at scale without custom instrumentation.
An indexless architecture reduces cost and increases flexibility, while real-time alerts support anomaly detection.
Mezmo offers role-based access control and team dashboards for top notch collaboration. Multiple teams can extract actionable insights without stepping on each other’s data.
And all of the above happens in a SOC2, HIPAA, GDPR environment with secure ingestion and audit controls. Teams can extract insights from sensitive data without compromising compliance or security.