Logging Best Practices, Part 1: Priority Number One
Isn’t all logging pretty much the same? Logs appear by default, like magic, without any further intervention by teams other than simply starting a system… right?
While logging may seem like simple magic, there’s a lot to consider. Logs don’t just automatically appear for all levels of your architecture, and any logs that do automatically appear probably don’t have all of the details that you need to successfully understand what a system is doing. Let’s start our exploration of logging best practices by thinking about priority number one for logs: actionable logs.
Regardless of how you log, the most important part of logging after simply having logs is having logs you can actually use. Actionable logs provide enough information, enough details, and enough history to ensure you actually can use the logs to get something done.
Let’s examine an example system. This example system is an application with a microservices architecture running on a Kubernetes cluster that lives on top of a FlatcarOS box. By default, you get logs from FlatcarOS and Kubernetes. How do you get logs for the different microservices? You have to program those logs into your application while building the codebase. Consider the following elements when you think about how to build actionable logs.
Your logs should provide enough information to answer “when,” “who,” and “where” for any event. As an example, let’s say you want to have access logs. If there was an access event for your system, the logline for that access event should answer when that access occurred, known as the timestamp. The logline should also include who accessed the system. Often this detail comes in the form of an IP address or other identifier, and the “who” should include not just human action, but also actions by other systems. Always log every access! Finally, the logline should include where the system was accessed, meaning which service was accessed or which part of the platform was accessed.
Your logs should provide enough details to answer “what” exactly happened during any single event. Simply answering the who, when, and where doesn’t actually tell you what happened, and a basic answer such as “this pod started” doesn’t provide enough detail to differentiate a code change triggering a pod restart from a system crash triggering the orchestrator restarting pods.
Finally, your logs should provide enough history to answer “why” something happened. Logs overall help you understand the context of the action that was taken. Writing log lines to a terminal somewhere without storing them is pointless. A single logline doesn’t provide you with enough context to understand the “why” for any action or event. How could you tell if an event was normal for a system without an easily accessible history?
All three of these components to a logging plan lead to actionable logs—logs you can actually use to understand what happened, why it happened, and how you can act to handle the situation. If you can answer who, what, where, when, and why from your logs, you probably have actionable logs and are well on your way to meeting or exceeding logging best practices.