Application Performance Monitoring (APM): Capabilities and Use Cases
• Understand the core capabilities of APM solutions
• Learn about the three most common use cases for an APM solution.
The functionality contained in an Application Performance Monitoring (APM) solution is at the core of any genuinely excellent observability platform. Gartner helped solidify the concept of APM about a decade ago when they defined the core capabilities that an APM suite is required to have. Accordingly, a product must offer digital experience monitoring, application discovery, tracing and diagnostics, and AI/ML-driven analytics to be considered a proper APM solution.
APM Capabilities 101
The core capabilities of APM each provide value in their way.
Digital Experience Monitoring
This functionality used to be called Real User Monitoring (RUM). Different vendors describe it differently. The general idea is that the application running on the client-side has added instrumentation that collects some device configuration information and runtime data related to performance – especially network statistics like retries and response times. Since performance monitoring rarely requires user-identifying information, digital experience monitoring is focused more on the platform and application runtime, including data such as the OS version, platform (Android, Windows, etc.), browser, network type, and monitor resolution.
Application discovery is what it sounds like: the ability to identify what is running in any environment where the APM solution operates. For example, you might need to discover all Java processes running on a server when installing the agent. The discovery functionality can then attach to and watch those processes. The functionality also works for large business applications (SAP) and cloud-native environments (like multi-node Kubernetes clusters).
Tracing and Diagnostics
Once you've discovered all applications and are tracking metrics and logs, the APM suite will start to correlate the data recorded with data from other available sources, including network traffic retrieved from the agent on the host. The APM solution will begin building a map of all interactions between the applications in the environment using network traffic, metrics, and log data. It will even note which external services you're calling.
APM tooling can also reach into an application (as long as it exists in one of many supported common languages) to trace transactions within the application from the entrance point to the bottom of the call trees. Between these two different tracing capabilities, teams can perform deep-dive diagnostics to rapidly track down the source of an error – from the point where a call enters the network, through multiple applications, right down to the database where the data is stored.
The feature that brings value to the data collected by an APM solution is its capability of performing constant AI/ML-driven analytics on that collected data. The tracing data builds application maps in this way. It's how the system retrieves deep-dive diagnostics with the context that provides the details necessary to understand what happened. It's what creates trends, enabling you to track growth and client data over time.
While every vendor has its formula for processing data, you will be able to mine and refine your data to support your organization's mission as soon as those out-of-the-box dashboards finish populating. Organizations can then make informed decisions like dropping support for iOS 10 because its users now represent less than 0.1% of traffic or including support for Swahili because users in Kenya now make up 2% of global traffic.
Common Use Cases for APM
You can use APM to improve the resiliency of an organization's application portfolio. Here are three of the easiest and most common ways to use APM to add real value to your business in short order.
CloudOps for Cloud-Native Applications
Based on the functionality described above, it is easy to see how an APM solution can add value to a CloudOps team responsible for monitoring their cloud-native applications and ensuring that they are alive and well. Regardless of how distributed their deployed applications are, agents will be available to instrument the platforms they are running. Meanwhile, the tracing and client data will build a complete view of which calls are happening most often and which calls have potential performance problems. These calls can include everything from on-premises Kubernetes to Spring Boot apps running on virtual servers in a public cloud to application platforms like Heroku.
Network Performance and Management
All modern application development and deployment models rely heavily on network-based connections. Cloud-native applications based on microservices are a perfect example. Whether inter-service or client–based, every interaction involves multiple network connections to complete its task. Performance isn't usually a problem within a data center, but it can be easy to underestimate the impact of applications with high levels of growth on the networking tier. In addition, having a view into all of the connections from clients all over the globe will allow for the implementation of better-caching solutions at the edge, the deployment of critical applications in regions closer to their most extensive user base, and even the decision to move static assets to a content delivery network at the right time.
Improving DevOps within Application Development Teams
Having an APM solution available in your organization brings additional benefits beyond supporting the operational teams. Along the lines of DevOps, giving developers as much information as possible about how their applications are running allows them to independently look for ways to improve their application performance and drastically reduce their mean time to fix any given problem. An APM solution provides access to deep diagnostics that can show developers which part of which service failed. In some languages, the traces will even go down the exact code line throwing the error.
At the core of every good APM solution is a central repository that holds the logs containing the analyzed information to find trends and diagnose problems. Mezmo is an excellent example of a solution that brings all of the analytics and collection capabilities necessary to amass the store of data that drives incident and performance management.