Planning Your Log Collection
- Organize your logging strategy
- Define "scope"
- Understand how to prioritize your logging strategy
- Understand how to roll out your log plan
The whole process of setting up observability for your organization can seem like a daunting task. I've found that it’s best to start small and from a place that allows you to see the most value as easily and quickly as possible. In that light, logging is the best place to start!
When it comes to setting up logging, it helps tremendously to have a project plan to organize your approach. From there, you can break tasks up into stories with appropriate pointing based on the complexity. A well-thought-out plan can help clarify where you are in the process, where you want to be, and the specific steps that you need to take to set up logging.
If you begin by documenting your current concerns and noting how you can address them with logging, you can ensure that this knowledge isn't just captured by one person, but also shared with others. Documenting will reduce the risk of having a single point of failure when it comes to planning how you should approach your log collection.
Organizing What's Important in Your Logging Strategy
A good place to start is with a problem statement. This will ensure that everyone understands why logging is so important for your systems and applications and what needs to improve about your existing strategy. A few questions that can help get you started include:
- What are the blind spots in the observability of your systems and applications?
- Could monitors be created by gathering these logs?
- Could logging help monitor an internal SLO for any of your current SLAs?
Starting with questions like these can help spark ideas and increase awareness among team members and stakeholders as to why logging is so critical for maintaining healthy systems and applications.
You’ll soon realize that if you don’t take advantage of logging, you won’t be able to operate as efficiently and effectively as you would like due to your lack of visibility. Logging can easily improve the stability and reliability of your platform because it will allow you to glean data-driven insights from your logs.
Without logging, your engineering teams may also find it difficult to collaborate with one another in order to solve problems. For example, your engineers might be going through many steps to view your Kubernetes logs when an issue occurs, or they might be going through the application’s stack trace to try to grep out all of the errors. Not only does this method take a lot of time, it can be an ineffective way to try to link issues together, making it difficult to identify the root cause. If this is a concern for your teams, include them in creating the problem statement and have them work together to create a new logging strategy.
By defining the scope of the project, you’ll ensure that everyone involved is on the same page. Once your teams have decided what logs are important to aggregate and what information you need to capture from them, you can define the scope of work and set metrics for success. You should break things down into:
- Must-Haves: These are things that you need in order for the project to be successful. For example, a must-have for you may be to set up logging using Mezmo, formerly known as LogDNA, for each customer account in your organization.
- Stretch Goals: These are things that might be good to add to the project if there is time before the deadline. For example, a stretch goal for you may be to store archived logs in an AWS s3 bucket for future use or to set up monitors based on log data collected to alert when problems arise.
- Not in Scope: This is anything seemingly related to logging that might not have a direct impact on the success of the project. For example, something that's not in scope may be setting up APM for each account that has logging.
Prioritizing Your Logging Strategy
After your plan for logging has been scoped out, the next step is to prioritize your list of items to make sure that the most urgent items are completed first. This can be done by tagging each element with a P1, P2, or P3 to note its priority level. If you aren’t familiar with this system, the levels are:
- P1 – High priority: Without these logs, you’ll incur many hours of tech debt trying to resolve common issues, such as ones related to server health and security. For example, if you properly log requests for the NGINX server on which your application runs, you’ll be able to gather information about the latency of each request. In addition, data obtained can be used to gauge customer satisfaction and also used to track your SLAs.
- P2 – Medium Priority: These tasks are impacting engineers and possibly users, but there are workarounds in place. In other words, there is no fire to put out – yet.
- P3 – Low Priority: These would be nice to have, but they aren’t crucial. For example, it would be handy to have logging for internal systems like GitHub. (You can learn more about that at: https://docs.mezmo.com/docs/github-events.)
Conclusion: Rolling Out Your Logging Plan to Have Logs for Days
Finally, after strategizing and creating a plan that fits the needs of your organization, you’ll need to start rolling out your logging solution. Luckily, Mezmo makes getting started super easy! Docs will show you how to ingest logs into the Mezmo platform and create your own parsing rules. Using integrations, which are very easy to set up, you can start getting logs in as little as two minutes. You can find the guide at https://docs.mezmo.com/docs.