What is an Observability Data Pipeline?
I can’t remember the last time I drove down highway 101 between San Francisco and the South Bay and didn’t see a billboard claiming to be the single tool to solve all of my data problems. I know what most of these products do because I work in the industry and have spent hours defining the word, “data.” But, for the other hundreds of thousands of people who see these billboards every day, I always wonder if they know what the difference is and why someone would need a dozen tools to solve a single problem. The truth is that the term, “data,” is ambiguous. Today, I’m going to talk specifically about observability data and the concept of an observability data pipeline.
What is Observability Data?
Observability represents the ability to see and understand the internal state of a system from its external outputs. Logs, metrics, and traces are known as the three pillars of observability, aka the external outputs. When we use the term, “observability data” these are the types of data that we’re referring to.
In order for companies to survive today, they must have access to this data. It allows them to maintain development velocity, understand and protect against security threats, and provide a positive customer experience. However, the majority of this data isn’t being leveraged. We see two main reasons for this:
- Observability data is spiky by nature. The volume of logs, for example, that an application produces varies based on a number of factors like how many people are using the app or if there’s an error. This makes it unpredictable and expensive.
- Many companies have deep-rooted organizational silos. In these companies it’s common for teams to have their own tools and workflows, which makes it hard to share information. For example, if a dev team has access to logs, but only the SRE team has access to metrics, it’s hard to correlate the two.
I didn’t just break the internet with this information. Vendors realized this was an issue years ago and went on to create solutions, like the infamous single pane of glass, to solve it. Although these solutions bring all three types of observability data into a single tool, they aren’t optimized for use by all of the data consumers. In organizations that operate with a DevOps culture folks from development, operations, and security all need access to their observability data. In order to get to it, I’ve seen teams create manual workarounds that negatively impact operational efficiency. These inefficiencies simply don’t fly anymore because real-time insights can mean the difference between resolving an issue quickly or incurring millions of dollars in damages.
What is an Observability Data Pipeline?
An observability data pipeline is a tool or process that centralizes observability data from multiple sources, enriches it, and sends it to a variety of destinations. This solves multiple problems, including:
- The need to centralize data into a single location.
- The ability to structure and enrich data so that it’s easier to understand and get value from.
- The need to send data to multiple destinations for multiple use cases.
This level of flexibility ensures that everyone can use their tools of choice and avoid costly vendor lock-in. The right tool can also put controls in place to manage spikes so that everyone in an organization has access to the data they need in real time, without impacting the budget.
The Mezmo Approach
For more than five years Mezmo, formerly LogDNA, has focused on building a modern log management tool for teams that embrace DevOps. Now, we’re working on a new pipeline product that allows organizations to centralize all of their log data from multiple sources; parse, normalize, and enhance it in Mezmo; and then stream it wherever they need—for example, to Mezmo Log Analysis for troubleshooting and debugging, to a SIEM for security, or to a data lake for compliance.
By shifting the control point left to the pipeline, existing Mezmo users can leverage many of their favorite features to unlock more value from their log data at scale. This is how we do it:
- We parse, and THEN index log data, which means that it can be searched immediately.
- Features like natural language search make it easy for anyone to use Mezmo and find what they need, fast.
- Our intuitive UI and robust APIs make it simple to configure workflows. Teams can automate ingestion, parsing, exclusion, and streaming so that everyone within the organization has access to the data they need, where they need it.
- Mezmo's vendor agnostic approach makes it easy to send data to multiple tools for immediate insights.
- We give users tools to control costs, like Exclusion Rules, Usage Quotas, and Alerts on unexpected spikes.
We started with log data because it’s the cornerstone of the DevOps culture. To date, we’ve helped thousands of developers and DevOps teams leverage their logs to build and maintain some of the world's most innovative products. Now, we’re helping them get even more value from their log data by streaming it to other destinations for a more unified development, security, and compliance practice.