What is Observability Data?

Learning Objectives

• Learn about observability

• Understand the differences between logs, metrics, and traces

• Understand the importance of the Three Pillars of Observability

Observability represents the ability to see and understand the internal state of a system from its external outputs. Logs, Metrics, and Traces are the three external outputs that are considered to be the three pillars of observability. When we use the term, “observability data” these are the types of data that we’re referring to.

There is some overlap between these outputs, but the function each serves is very particular:

Logs are the record of any event that may occur within a system. They are the core atomic unit for understanding what is happening at any given time within an application or environment. They provide the granular details that are crucial for engineers to respond to incidents and debug issues. 

However, logs can also be the most difficult of the observability data types to work with. While they have valuable information, it is often verbose and unstructured, making it difficult for humans to understand the meaning behind it. This is why a sound logging strategy includes the ability to automatically parse log data to make it human-readable so teams can understand and take action on it.

Metrics provide a higher-level view of data insights in the form of numeric measurements over a predetermined amount of time. While logs are focused on a single event happening within the system, metrics aggregate similar events to indicate trends over time. Because setting up the presentation of the data requires prior decision-making to unearth the insights that matter most to the consumer, metrics are more easily consumed than logs. However, they also don’t provide the same granular level of detail as logs since they present aggregated data. 

Traces stand out from the other two observability data types because they stitch together a number of related events between various components, while logs and metrics focus on a single event within a system (either on its own or in aggregate).This is why traces are more accurately referred to as “distributed traces,” as they bring together various sources of information to show causal relationships within an ecosystem. 

In order for companies to survive today, they must have access to all of these data types. This enables them to maintain development velocity, understand and protect against security threats, and provide a positive customer experience. However, the majority of this data isn’t being leveraged. We see two main reasons for this:

  1. Observability data is spiky by nature. The volume of logs, for example, that an application produces varies based on a number of factors like how many people are using the app, or if there’s an error. This makes it unpredictable and expensive.  
  2. Many companies have deep-rooted organizational silos. In these companies it’s common for teams to have their own tools and workflows, which makes it hard to share information. For example, if a development team has access to logs, but only the Site Reliability Engineering (SRE) team has access to metrics, it’s hard to correlate the two.  

In a DevOps or DevSecOps culture with distributed and autonomous teams, data consumers have their own specific observability data needs. While single-pane-of-glass approaches work for centralizing all types of observability data in one tool, they can be limiting for multiple data consumers and use cases.

Use Cases of Observability Data

With the three pillars of observability data, any team that operates with a DevSecOps mindset should have access to the data insights they need to perform critical functions. Here’s how each of the roles within an organization might leverage their observability data:

Site Reliability Engineers (SREs)

Because they are responsible for the ongoing health of engineering systems, SREs often rely on observability data to perform critical functions of their work. Logs act as the single source of truth, as they provide clear indicators of what’s happening in an environment to allow for quick time to resolution when things go wrong. Additionally, they can use metrics and traces for higher-level visibility and reporting into the overall health of those systems. By blending these observability data types, SREs can effectively mitigate risk by being more proactive to ensure a flawless customer experience.

Developers 

Developers may not rely on observability data as much as their SRE counterparts in their day-to-day jobs, but they still provide key insights that can help accelerate their work of delivering value to their customers. While metrics can uncover inefficiencies that can influence development roadmaps, more detailed log and trace data can help troubleshoot pre-production before a developer commits any code. 

Security 

Similar to SREs, security teams are increasingly reliant on observability data as key indicators of the security of their environments. While logs provide insights that allow for reactive incident response to potential bad actors, metrics and traces give the higher level overview that allows for constant threat hunting and detection. Security engineers might access this data in a log analysis tool or a Security Information and Event Management (SIEM) tool.


How Mezmo Enhances Observability Data

Mezmo, formerly known as LogDNA, has spent the past 5 years focused on unlocking the value of logs for any data consumer. We created a modern platform that makes it easy for developers to ingest, parse, search, and take action on log data to more effectively troubleshoot and debug. However, when considering the issues around access to spiky observability data, we realized that the solution that will truly empower teams is the ability to easily control the flow of that data from the start. This requires the ability to structure and enrich data and then route it to the right tool for a specific use case. This is why we are building a new Observability Data Pipeline that will allow teams to better control how they ingest, normalize, and route observability data at scale. Learn about our Telemetry Pipeline, designed for observability data, in this blog.

It’s time to let data charge