Serverless Logging Performance, Part 1

4 MIN READ
MIN READ

When thinking about serverless applications, one thing that comes to mind immediately is efficiency. Running code that gets the job done as swiftly and efficiently as possible means you spend less money, which means good coding practices suddenly directly impact your bottom line. How does logging play into this, though? Every logging action your application takes is within the scope of that same performance evaluation. Logging processes can also be optimized just like for any process your code spins up. In this series of posts, let's dive into how you can think about logging in a serverless world.

Part 1: Performance, Episode 1

I know that one of the benefits of serverless architectures is not managing your own hardware, but you still need to consider the hardware your code requires when thinking about managing cost. Most serverless providers charge by some combination of the amount of memory and unit of time needed by your function or application. So, in a significantly simplified pricing model on two offerings each just for argument's sake, AWS charges per request and byte-second of memory (Lambda) or per vCPU byte-second and byte-second of memory (Fargate) whereas GCP charges per vCPU byte-second (Cloud Run) or per invocation (Cloud Function). There's a lot more involved in pricing structures for these as-a-Service systems, including the cost for network ingress or egress, storage, data stores, and analytics. However, we're just going to examine the actual compute power here when discussing a deep dive on logging.

Knowing how much compute power you'll need for your code drives the decision for different tiers, and therefore performance equals money. Your choices around how to log data can affect those performance metrics. Let's start with examining the first time your code spins up.

Cold Start Considerations

One of the biggest concerns with serverless performance and efficiency is the cold start, or the first spin-up of an instance when there's no warm (recent) instance available. During the start of a serverless system, the service brings a container online, and then the container starts running your specific system. At the point of the container starting to run your unique instance, the meter starts running. The first thing the container does on a cold start is a bootstrapping process where it installs any dependencies you expect to need. If you've ever watched a Docker container image building, you know that this process can run out on the scale of minutes. Then the container's environment is set, and your code begins to run. On a warm start, on the other hand, the environment comes preset, and your code begins running immediately.

As you might guess, the bootstrapping process in a cold start can be expensive. This need for efficient setup is really where dependency management becomes a not-so-secret weapon for lowering the cost of a serverless system. Your logging library of choice is no exception. Since setting up dependencies does take compute time, you have to factor in that time in your performance calculations. As a result, the ideal state for logging in a serverless architecture is to use built-in methods wherever possible. That means using Python's built-in logging library, for example, to reduce calls to PyPI. If you can't use a built-in library, such as if you're running a NodeJS function and need more than just console.log(), the best logging library for you is the one that balances the features you want with the least number of dependencies needed to make those features happen.

Construction

You should examine how you are constructing logging messages and objects. If there is upfront processing going on when constructing a message or an object, consider offloading that processing time to see if you can reduce usage. To go back to Python as an example, consider this simple benchmark set1, run from Python 3.7.4:

These timeit results came from using different string generation methods and putting them into a logger to drop a human-readable message into the logs, adding complexity as we went. The methods in the benchmark are

  • the original %-formatting method generating a string and then passing that to the logger,
  • a direct method passing a format string with %-formatting and then the variables to the logger,
  • the str.format() method generating a string and then passing that to the logger,
  • the f-string formatting method generating a string and then passing that to the logger,
  • a generic string concatenation method generating a string and then passing that to the logger,
  • direct pass of a variable pointing to a string to the logger to generate the message internally,
  • direct pass of a string to the logger to generate the message internally, and
  • direct pass of a list to the logger to add to the message at the end.

As we add more integers and strings, the complexity of each call rises, and we start to encounter the different costs associated with each string conversion call. The concatenation method, for example, requires a call to the str() method to convert the integer into a string so it can be concatenated. You’ll notice that, depending on need, different methods are more efficient than others. Under the hood, Python’s logging library uses %-formatting, the oldest method, to ensure backwards compatibility.

We’ll spend some more time in a different series digging into Python’s logging library to understand better benchmarks, but this example should give you a fairly good sense of why you need to consider how you construct your logging messages if you’re going text-based. These numbers may not seem like much, but, in reality, you probably are making hundreds of calls to your logger on a larger instance than this. The difference between the fastest method and the slowest method when you have multiple inputs is on the order of 4.5 microseconds, and this message generation example is extremely simple in comparison to what you typically need for logging in a serverless application. When milliseconds count, this kind of consideration is a priority.

Next time

Once you've considered how to improve the performance of startup and basic text-based messages in your logging setup for serverless, you need to start thinking about crafting logging objects to transmit over those networks in the smallest possible package as fast as possible. In the next post, we're going to dive into logging objects versus strings; and then in subsequent posts, we'll start thinking about memory allocation, concurrency and state, and security.

  1. If you want to test these benchmarks yourself on your own system, you can find the code at https://github.com/nimbinatus/benchmarking-logs. The repo is a work in progress, so this code may be adjusted when you read this.

Table of Contents

    Share Article

    RSS Feed

    Next blog post
    You're viewing our latest blog post.
    Previous blog post
    You're viewing our oldest blog post.
    Mezmo + Catchpoint deliver observability SREs can rely on
    Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)
    What is Active Telemetry
    Launching an agentic SRE for root cause analysis
    Paving the way for a new era: Mezmo's Active Telemetry
    The Answer to SRE Agent Failures: Context Engineering
    Empowering an MCP server with a telemetry pipeline
    The Debugging Bottleneck: A Manual Log-Sifting Expedition
    The Smartest Member of Your Developer Ecosystem: Introducing the Mezmo MCP Server
    Your New AI Assistant for a Smarter Workflow
    The Observability Problem Isn't Data Volume Anymore—It's Context
    Beyond the Pipeline: Data Isn't Oil, It's Power.
    The Platform Engineer's Playbook: Mastering OpenTelemetry & Compliance with Mezmo and Dynatrace
    From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace
    Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility
    Architecting for Value: A Playbook for Sustainable Observability
    How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines
    Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo
    Introducing the New Mezmo Product Homepage
    The Inconvenient Truth About AI Ethics in Observability
    Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)
    Do you Grok It?
    Top Five Reasons Telemetry Pipelines Should Be on Every Engineer’s Radar
    Is It a Cup or a Pot? Helping You Pinpoint the Problem—and Sleep Through the Night
    Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos
    Why Datadog Falls Short for Log Management and What to Do Instead
    Telemetry for Modern Apps: Reducing MTTR with Smarter Signals
    Transforming Observability: Simpler, Smarter, and More Affordable Data Control
    Datadog: The Good, The Bad, The Costly
    Mezmo Recognized with 25 G2 Awards for Spring 2025
    Reducing Telemetry Toil with Rapid Pipelining
    Cut Costs, Not Insights:   A Practical Guide to Telemetry Data Optimization
    Webinar Recap: Telemetry Pipeline 101
    Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit
    2024 Recap - Highlights of Mezmo’s product enhancements
    My Favorite Observability and DevOps Articles of 2024
    AWS re:Invent ‘24: Generative AI Observability, Platform Engineering, and 99.9995% Availability
    From Gartner IOCS 2024 Conference: AI, Observability Data, and Telemetry Pipelines
    Our team’s learnings from Kubecon: Use Exemplars, Configuring OTel, and OTTL cookbook
    How Mezmo Uses a Telemetry Pipeline to Handle Metrics, Part II
    Webinar Recap: 2024 DORA Report: Accelerate State of DevOps
    Kubecon ‘24 recap: Patent Trolls, OTel Lessons at Scale, and Principle Platform Abstractions
    Announcing Mezmo Flow: Build a Telemetry Pipeline in 15 minutes
    Key Takeaways from the 2024 DORA Report
    Webinar Recap | Telemetry Data Management: Tales from the Trenches
    What are SLOs/SLIs/SLAs?
    Webinar Recap | Next Gen Log Management: Maximize Log Value with Telemetry Pipelines
    Creating In-Stream Alerts for Telemetry Data
    Creating Re-Usable Components for Telemetry Pipelines
    Optimizing Data for Service Management Objective Monitoring
    More Value From Your Logs: Next Generation Log Management from Mezmo
    A Day in the Life of a Mezmo SRE
    Webinar Recap: Applying a Data Engineering Approach to Telemetry Data
    Dogfooding at Mezmo: How we used telemetry pipeline to reduce data volume
    Unlocking Business Insights with Telemetry Pipelines
    Why Your Telemetry (Observability) Pipelines Need to be Responsive
    How Data Profiling Can Reduce Burnout
    Data Optimization Technique: Route Data to Specialized Processing Chains
    Data Privacy Takeaways from Gartner Security & Risk Summit
    Mastering Telemetry Pipelines: Driving Compliance and Data Optimization
    A Recap of Gartner Security and Risk Summit: GenAI, Augmented Cybersecurity, Burnout
    Why Telemetry Pipelines Should Be A Part Of Your Compliance Strategy
    Pipeline Module: Event to Metric
    Telemetry Data Compliance Module
    OpenTelemetry: The Key To Unified Telemetry Data
    Data optimization technique: convert events to metrics
    What’s New With Mezmo: In-stream Alerting
    How Mezmo Used Telemetry Pipeline to Handle Metrics
    Webinar Recap: Mastering Telemetry Pipelines - A DevOps Lifecycle Approach to Data Management
    Open-source Telemetry Pipelines: An Overview
    SRECon Recap: Product Reliability, Burn Out, and more
    Webinar Recap: How to Manage Telemetry Data with Confidence
    Webinar Recap: Myths and Realities in Telemetry Data Handling
    Using Vector to Build a Telemetry Pipeline Solution
    Managing Telemetry Data Overflow in Kubernetes with Resource Quotas and Limits
    How To Optimize Telemetry Pipelines For Better Observability and Security
    Gartner IOCS Conference Recap: Monitoring and Observing Environments with Telemetry Pipelines
    AWS re:Invent 2023 highlights: Observability at Stripe, Capital One, and McDonald’s
    Webinar Recap: Best Practices for Observability Pipelines
    Introducing Responsive Pipelines from Mezmo
    My First KubeCon - Tales of the K8’s community, DE&I, sustainability, and OTel
    Modernize Telemetry Pipeline Management with Mezmo Pipeline as Code
    How To Profile and Optimize Telemetry Data: A Deep Dive
    Kubernetes Telemetry Data Optimization in Five Steps with Mezmo
    Introducing Mezmo Edge: A Secure Approach To Telemetry Data
    Understand Kubernetes Telemetry Data Immediately With Mezmo’s Welcome Pipeline
    Unearthing Gold: Deriving Metrics from Logs with Mezmo Telemetry Pipeline
    Webinar Recap: The Single Pane of Glass Myth
    Empower Observability Engineers: Enhance Engineering With Mezmo
    Webinar Recap: How to Get More Out of Your Log Data
    Unraveling the Log Data Explosion: New Market Research Shows Trends and Challenges
    Webinar Recap: Unlocking the Full Value of Telemetry Data
    Data-Driven Decision Making: Leveraging Metrics and Logs-to-Metrics Processors
    How To Configure The Mezmo Telemetry Pipeline
    Supercharge Elasticsearch Observability With Telemetry Pipelines
    Enhancing Grafana Observability With Telemetry Pipelines
    Optimizing Your Splunk Experience with Telemetry Pipelines
    Webinar Recap: Unlocking Business Performance with Telemetry Data
    Enhancing Datadog Observability with Telemetry Pipelines
    Transforming Your Data With Telemetry Pipelines