See how you can save 70% of the cost by reducing log volume and staying compliant.

Accelerating Dev Workflows: Terminal-Driven Debugging

    4 MIN READ

    The pursuit of Digital Transformation and DevOps practices has led to several benefits such as increased deployment rates and better collaboration across teams. However, it has also led to endless abstraction, an increase in responsibilities, and many new tools (Kubernetes, hybrid-clouds and all their services, etc.). This increase in complexity has turned observability into an essential component of all ecosystems.  Yet when we bring up O11y, we seem to focus on the operational side of the house; looking at uptime and outages. Ensuring our systems stay online is crucial, but what other things should we be talking about to ensure the applications and services that run on these systems behave as they should?

    Mezmo, formerly known as LogDNA, has extensively focused on what we think is the most important of The Three Pillars of Observability—logging! As a modern logging solution, our focus is on allowing ops to accelerate troubleshooting time and developers to increase their efficiency, all the while ensuring the ability to control log intake quickly.  Our platform serves as a definitive source of truth by collecting data from ALL sources while providing an easily accessible, robust UI that offers benefits that include but aren’t limited to:

    • Natural Language Query Syntax which accelerates time-to-value by sifting logs through organic keywords instead of having to learn yet another query language.
    • Custom Parsing which extracts value from any given log through a GUI-driven process.
    • Boards and Screens which accelerate troubleshooting and increase collaboration by visualizing any given data point. 

    Still, we often encounter customers and prospects who use the terminal as a way of getting access to their logs. Does that seem old-school? Not at all! If you have a log-aggregation tool in place, does it not make more sense to use it rather than a Command Line Interface (CLI)? Maybe, but knowing which tool to use and when is an art form. The question then becomes, with all the advantages a centralized logging solution brings, why would someone choose to remain within their terminal? It ultimately comes down to context-switching, also known as the “silent killer of productivity.” Various studies and books dive into the time-management aspect, from “Making Work Visible” to “Value Stream Mapping.” Rather than force you to read all the studies in their entirety, here’s the TLDR, which I particularly relate to the following two quotes:

    • Gerald Weinberg states, “Each extra task or ‘context’ you switch between eats up 20-80% of your overall productivity.”
    • Zahra Abad addresses that “Practitioners perceive task-switchings are as disruptive as spontaneous and random interruptions, regardless of the source and type of the switching.”

    These findings are now even more evident in a post-COVID, remote-driven world. With all the distractions out there (e.g., kids, Slack, the overall existence of the internet), a simple switch from my IDE to a completely different platform can, and likely will, negatively impact my focus, train of thought, and overall performance.   

    Let’s run through a developer-focused exercise and see why utilizing the CLI makes sense.

    • We are in our IDE, actively interacting with our local environment, and just finished extending our API functionalities.
    • We are following modern practices. We’ve containerized everything  and are ready to deploy, so now we write a k8s deployment YAML. 
    • Our integrated terminal is already open, so we run a quick kubectl apply -f node-api.yaml to test out the updates.
    • We now need to check whether the pod is running or has crashed, so we run a kubectl get pods to identify the pod’s status.
    • With the list of pods displayed in front of us, we run a kubectl logs pod_name to check our application’s logs.

    While the above seems straightforward, we don’t do these steps just once. We do it multiple times and run it through different debugging variations (e.g., status checks, simulation tests, operational tests with other internal/external services, etc.). Suddenly, our desire to stay within our IDE and not context-switch backfires and begins impacting our performance.

    Manually running commands over and over is time-consuming and error-prone. It works okay for isolated pods, but we often don’t have that luxury because most microservices interact with other services across various infrastructures. In the long run, debugging becomes a burdensome project in itself, and we begin encountering the following pain points:

    • Expertise: Understanding the ins and outs of your application can be complicated enough, but now we have to extract the correct information through bash commands (e.g., grep, ssh, flags, etc.). 
    • Repeatability: While we might be accustomed to running commands, once we begin searching through different variations over and over, these searches quickly become both time-consuming and error-prone.
    • Ephemerality: Containerization is the cornerstone of most new deployment patterns. It allows platforms like Kubernetes to self-heal whenever pods fail. However, identifying application failures becomes that much more complicated when the logs containing the reasons for failure disappear along with previous pods.
    • Extensibility: kubectl logs works okay for isolated pods, but we don’t often work with isolated pods. Our new deployment will likely interact with other services across various platforms/infrastructures (e.g., Lambda, Google App Engine, etc.). kubectl logs can help identify problems within our cluster, but we fall back into the repeatability issue while identifying ever changing pod names across different namespaces. Then there’s the outside ecosystem. Do we ssh into those machines to see that interaction? Do we utilize aws logs to get insight into our interacting Lambda functions? At that point, we might as well use our web app.
    • Visibility: As we observe the image above, our results show complex JSON structures that are noisy and not human-readable. We have ensured that our logs are structured so that external platforms can take advantage of them. Still, as a side-effect, we have objectively worsened our ability to debug within our terminal quickly.

    While this situation can leave one feeling stuck between a rock and a hard place, another choice brings the best of both worlds, the benefits of a logging platform without the need to context-switch. The Mezmo CLI is an extension of the Mezmo platform, which allows us to access any data collected across all our sources from the comfort of our desired IDE. Let’s address how the Mezmo CLI solves our previous pain points:

    • Expertise: The LogDNA CLI uses the same search functionality as the web app. Our natural language query syntax abstracts the need to learn yet another query language. This idea, coupled with our parsing on ingestion, means that you can search for any specific key/value pair or any keyword. Are you interested in what a particular application is doing? Execute logdna search 'app:app-name any-keywords'. You can also narrow your search by utilizing the flag --timeframe 'past 20 minutes'.
    • Repeatability: Active development and debugging benefit substantially from immediate feedback. While logdna search simplifies looking through past logs, you can also harness the power of our log ingestion engine, Buzzsaw, and utilize logdna tail 'any-query' to receive active feedback on the applications you're currently developing.
    • Ephemerality: Regardless of the technology or their ephemerality, LogDNA remains your source of truth and ensures that all logs remain easily accessible.
    • Extensibility: LogDNA is already collecting logs from all of your sources. Pulling the correct data is a matter of combining the right queries. Here are some examples: 
      • Tail logs from entire deployments rather than specific pods: logdna tail 'app:deploy-name'
      • Tail logs from an entire namespace: logdna tail 'namespace:ns-name'
      • Tail logs from an entire or a mix of clusters: logdna tail 'tag:prod-cluster'
      • Tail performance of multiple components that exist across different deployment targets and interact with your new deployment: logdna tail 'response_time:>120 tag:some-overarching-app'
    • Visibility: Best practices indicate that we should be actively working with structured logs (preferably JSON). As beneficial as JSON is, it is meant to be processed by machines, not humans. Similar to the UI, the Mezmo CLI enforces the “message” field to represent the human-readable component. (In the example below, we are ingesting JSON values and can search based on all parsed fields, but our visibility improves by only seeing the human-readable component).

    As explained, Mezmo and its CLI extension quickly enhance your debugging practices by providing immediate access in an easily readable format to ALL the logs that matter to YOU. 

    While utilizing a web UI is commonplace for most software products, developers have to battle between using a built-in terminal constantly, context-switching into an unfamiliar app, or even having to wait for another team to email them a zip archive of logs. Through our CLI, Mezmo can streamline development workflows and bring the power of Mezmo's search query capability directly into their IDEs. Ultimately, developers want to spend their time writing code and focusing on feature releases. That’s where Mezmo fits in.

    Dan Flores Montanez


    Dan is a Solutions Engineer at Mezmo and is passionate about all DevSecOps-related topics. Outside of work, he spends his time playing video games with his daughter, playing instruments, and traveling.