A Fourth Pillar of Observability

Learning Objectives

• Understand the original three pillars of observability

• Learn about some of the things that have been considered to be the possible fourth pillar of observability

If you pay attention to IT buzzwords, you're probably familiar with "three pillars of observability." Traditionally, definitions of observability have focused on the idea that three key data sources – logs, metrics, and traces - drive it.

Increasingly, however, engineers are talking about a possible fourth pillar of observability – although there is some diversity of opinion about what that fourth pillar is.

This article explains the three classic pillars of observability and then provides a perspective on the potential fourth pillar.

What Is Observability?

The classic definition of observability is the use of data exposed on the "surface" of a system to understand the system's internal state.

The observability concept has a decades-long history in the realm of engineering. For most of that period, however, no one thought about observability in the context of IT. But that changed starting in the late 2010s when the proliferation of complex, cloud-native software environments pushed developers and IT engineers to rethink how they collected data from and analyzed the applications and infrastructure they had to manage.

The Three Pillars of Observability

Traditionally, when most developers and IT teams set out to "do" observability, they focus on three fundamental types of data that modern software environments expose:

  • Logs: Generated by both infrastructure and applications to record events.
  • Metrics: Data about the state and performance of infrastructure or applications. An example can be how much available memory is currently in use or how many requests an application is processing per minute.
  • Distributed Traces: These track how requests "flow" across the various layers of a software environment. A distributed trace reveals data about how quickly each part of an application stack responds to a request, where bottlenecks are occurring, etc.

The core idea behind observability is that, by analyzing this data in tandem, teams can gain critical insights into the overall state and health of the applications and infrastructure they monitor.

For example, by continuously collecting application error metrics, a team could detect a sudden spike in errors. It could then use logs to assess whether the increase in errors correlates with an event recorded in a log file that might explain why the errors are happening. It could also perform distributed traces to identify which specific microservice triggers the errors. When you put all of this data together, you get observability.

The Fourth Pillar of Observability

That, at least, is the theory. But in practice, you may run into situations where logs, metrics, and traces alone aren't enough to provide complete answers to observability questions.

That's why it's now becoming fashionable to talk about the need for the fourth pillar of observability. Again, however, the fourth pillar depends on whom you ask.

Context

Some have argued that the fourth pillar of observability is "context." The idea here is that you need to integrate logs, metrics, and tracing data systematically to provide the most meaningful observability insights.

You could argue that gaining this type of context isn't the fourth pillar of observability as much as it's a way of working with the three classic pillars. Context is not a data source; it's a way of using data sources.

Still, it's not wrong to drive home the idea that context is everything; collecting individual logs, metrics, and traces alone is not very useful if you can't interrelate them quickly and meaningfully.

Security

Others make the case that security is the fourth pillar of observability. The thinking here is that you should tightly integrate observability workflows into security workflows.

If you embrace DevSecOps, you probably already recognize the need to align observability data with security insights. If not, the idea of security as the fourth pillar can remind you of why the purpose of observability isn't just managing application performance. It can help keep environments secure, too.

Thinking Beyond Pillars

Last but not least, folks contend that the idea of observability pillars is flawed because it narrows engineers' ability to think about how best to gain critical insights into system performance.

There's some merit to this point of view. You could argue that the best way to observe an environment is to collect and analyze any relevant insights about that environment, whether they are logs, metrics, traces, or anything else. Suppose we obsess over just three (or four, or five, or whatever) pillars of observability. In that case, we risk overlooking opportunities to glean essential insights via data sources that don't feature on our list of chosen pillars.

Conclusion

Ultimately, there are no hard and fast rules about how many pillars of observability there should be or what each pillar should constitute. While the traditional consensus has been that logs, metrics, and traces lay the foundation for observability, there's an increasing tendency to leverage additional data sources.

Perhaps the best way to think about the pillars of observability is to recognize that this is a pliable concept. Rather than clinging to some dogma, use whichever observability pillars make the most sense for your software and business.

It’s time to let data charge