System Logging Best Practices

Learning Objectives

• Understand best practices for logging events
• Learn useful formats for data parsing
• Identify events useful in performance analysis

Advancements in technology have led to the expansion of on-premise infrastructure including widespread adoption of cloud computing. Distributed systems are difficult to manage, but monitoring for anomalies and detecting potential issues is even more challenging. To improve analytics output and forensics, you can follow some best practices to better leverage these files for system maintenance, root cause analysis, bug fixes, and cybersecurity alerts.

Log Meaningful Information

You can customize the information stored in logs to help them identify exactly where and when an error occurred, but events should be meaningful for human reviews. For example, the following information doesn’t tell you much about an error:

Unhandled Exception:
System.IndexOutOfRangeException: Index was outside the bounds of the array.

You know what happened is that a loop likely attempted to retrieve data from an index that does not exist in an array, but when did this happen? Where did it happen? Who received the error? An even better log looks like the following:

Unhandled Exception:
System.IndexOutOfRangeException: Index was outside the bounds of the array. Transaction ID 47492389 failed on 2021-01-28T20:04:13Z at /checkout/pay.

In addition to the exception, the above information tells you the transaction, date, time, and the endpoint where the error occurred. A collection of more verbose error information better helps root cause analysis during bug fixes and remediation. For system administrators with hundreds of machines to manage, more information gives a quicker overview of the problem coupled with the information that both locates the machine and the issue causing errors.


Create Logs Using Structured Formatting

Formatting logs in a structured way is beneficial for two reasons: it makes reading easier for humans and machines. At some point, you may need to import large volumes of logs. In a large enterprise environment, infrastructure and applications could create thousands of events a day making it difficult to use them for analysis without third-party tools. Eventually, you may want to import logs into an analytical solution to parse data and get visual output that represents the health of the system.

For example:

APP:commerce Transaction:47492389 TIME:2021--01-24T08:38Z ENDPOINT:/checkout/pay

The above log event uses a structure that makes it easier to import data into an analysis tool. It also structures the event so that humans can better identify important information when reviewing logs for specific data.

Include Logs in Backup Routines

Logs can be parsed and used in data recovery efforts, but administrators often forget to make them a part of the backup process. Logs should be backed up just like any other critical file. Should an event such as ransomware permanently destroy or damage logs, they can then be restored and used during forensics and data recovery.

Somewhat related, logs should also be redundant just like data and storage. Should one logging solution fail, the system can use the alternative until system administrators can restore the original. This strategy requires more storage space, but the cloud gives organizations the ability to scale storage to accommodate increased requirements.

Don’t Log Sensitive Information

While logs should contain enough information for audit trails and root cause analysis, the information should not expose sensitive data. Only specific accounts should have access to logs, but creating events with sensitive information adds risk of threats should an attacker compromise security surrounding logs. Not only could logs be used to mount additional attacks, but they could also violate compliance rules.

In addition to always determining if data improperly discloses personally identifiable information (PII), here is a short list of items that should not be logged:

  • Passwords
  • Social security numbers
  • API keys or secrets
  • Private encryption keys
  • Credit card numbers

Centralize Log Aggregation

It’s easy to get overwhelmed with numerous log storage locations, system monitoring solutions, and application alerts. Centralized logging reduces much of this overhead and eliminates many of the issues with fragmented log storage across several systems. It also facilitates better analytics, especially if these logs are imported into third-party tools.

Centralized logging solutions also provide easier management for backups, cybersecurity, and monitoring. It mitigates risks of losing logs and information and provides collaboration between several individual resources and monitoring solutions that use events to determine anomalies to alert administrators of suspicious activity.

Don’t Forget Endpoint and Device Events

Any network resource that provides critical infrastructure or adds risk to the organization should be included in log strategies. In an enterprise environment, users could potentially have their own devices connected to the network, and mobile applications might connect to internal processes using API endpoints. A component of good event logging strategies includes collecting data from devices where users are able to connect to the network.

By logging events on all endpoints, it helps you understand the user experience and interpret feedback from application activity. It also helps administrators recognize bottlenecks and scale resources before they create severe productivity limitations.


Limit Log Access to High-Privileged User Accounts

If logs are disorganized, low-privileged users could accidentally have access to sensitive information. Centralizing your logs with a solution that offers Role Based Access Control can help you manage who has access to what information. For example, you may want to provide high-privileged users access to read all raw data logs but only allow standard users to see logs from certain sources or visualization tools that provide high-level overview of systems and applications.

Allowing unnecessary access to logs increases your attack surface. If just one user falls victim to a phishing scam, logs would disclose information that can be used in future attacks. It also provides critical information about the infrastructure of network resources and applications. An advanced persistent threat (APT) giving attackers access to logs could provide them with numerous data points leading to a severe compromise and systemwide breach.

Log Successful and Failed Events

Not every anomaly results in a failed event such as an application error or an unsuccessful authentication attempt. To get the full picture, administrators need several events that tell a story during investigations. Without enough events, anomalies and suspicious activity could be missed. For instance, a cyber-criminal launching brute-force attacks against account passwords would show several unsuccessful authentication attempts, but logs would not show suspicious activity from stolen credentials and successful authentication after phishing attacks.

Administrators should develop a strategy for events that should be logged. Too much information makes logs undecipherable and wasteful of storage, but verbose logs with useful information can be used in effective monitoring and auditing.

Conclusion

Logging is essential in the enterprise for investigation into cyber-events, monitoring, root cause analysis, forensics, and overall maintenance of your systems. Developing a strategy before implementing logging solutions is just as important as the actual logging solution. Before diving into logging solutions, ensure that you put together a plan and follow best practices. Learn how Mezmo, formerly known as LogDNA, makes logging accurate, easy, and scalable.

It’s time to let data charge