Managing Telemetry Data Overflow in Kubernetes with Resource Quotas and Limits
One of the inherent challenges you'll face when working with Kubernetes is that a typical cluster includes many resources that produce telemetry data. Because producing and moving telemetry data consumes resources, you can end up in situations where different workloads are competing for the resources necessary to manage telemetry data. And if some workloads lack the resources necessary to process telemetry data quickly enough, the result could be blindspots in your Kubernetes monitoring and observability strategy.
Fortunately, Kubernetes offers some built-in features that can help alleviate these challenges: resource quotas and limits. Keep reading for a look at how resource quotas and limits can improve your clusters' ability to handle telemetry data, as well as how to get started leveraging these features for this purpose.
Kubernetes Telemetry Challenges
Before diving into how resource quotas and limits help to streamline telemetry in Kubernetes, let's talk a bit more about why telemetry can be a challenge in Kubernetes.
Telemetry is the generation, processing, and management of data that teams can use to monitor and observe IT environments. Typically, telemetry data is generated by an IT resource in the form of logs, traces, or metrics. Then, the resource transmits it – often with the help of a telemetry pipeline – to a location where it can be analyzed and/or stored over the long term.
Like any IT process, processing telemetry data consumes CPU and memory resources – and the more telemetry data you are working with, the higher the CPU and memory load tends to be. Telemetry processing may also consume network bandwidth when transmitting data from one server to another.
In Kubernetes, it's common to have many components that produce their own telemetry streams. Various elements of the Kubernetes control plane – such as the API server, the etcd key-value store, and control plane nodes – produce telemetry data. So do most applications hosted on Kubernetes.
Now, this is fine so long as each component that needs to generate and transmit telemetry data has sufficient CPU and memory resources to do so. But because the total CPU and memory resources available to a Kubernetes cluster are limited based on the number of servers (or nodes, as they are called in Kubernetes) in the cluster, and because a single cluster may contain dozens or even hundreds of individual nodes and pods that generate telemetry data, the problem can arise that sufficient resources are not available for processing telemetry data – especially if heavier-than-usual load causes an increased volume of telemetry data.
This is one example of what's commonly referred to as the "noisy neighbor" problem. When one component inside a Kubernetes cluster begins a resource-intensive task (such as responding to an increase in requests), it can become a noisy neighbor, sucking up resources that other components require to perform important tasks.
If that happens – if one component in a Kubernetes cluster hogs resources during telemetry processing to the point that other components can't do their jobs – two types of problems may occur:
- Other components won't be able to generate and/or transmit telemetry data quickly enough to support real-time monitoring and observability. The data they send may be delayed until CPU and memory resources free up, preventing teams from detecting issues as quickly as possible.
- In extreme cases, other components may stop operating properly because they lack the CPU and memory necessary to do so. In other words, an application could begin generating errors or dropping requests because the resources it needs to operate normally are being tied up by another application's telemetry operations.
Kubernetes Telemetry Constraints Example
To put this in a real-world context, imagine that you have a Kubernetes cluster that hosts five different applications. During periods of normal activity, the cluster operates with total CPU and memory consumption rates of 90 percent – meaning that 10 percent of its capacity is held in reserve. Ideally, clusters would have a larger resource buffer than this, but because increasing spare resource capacity requires adding servers to a cluster – and because adding servers costs money – it's not uncommon for the resource margins of a Kubernetes cluster to be tighter than they should be.
One day, due to a configuration oversight related to how much telemetry data it is supposed to process, one application begins generating ten times as many logs, traces, and metrics as usual, leading to an increase in its telemetry operations. As a result, the application's CPU and memory also increase ten-fold, bringing the cluster's CPU and memory loads to 100 percent.
Because the cluster's resources are now maxed out, there is no spare CPU and memory available for other applications, and the buggy app will continue hogging resources until it either ends up in a CrashLoopBackOff state or is reconfigured to manage telemetry data properly. In the meantime, other applications may not be able to function properly because there are not enough resources to accommodate them.
Kubernetes Won't Automatically Solve Telemetry Problems
You might think that Kubernetes would automatically distribute resources to components within a cluster based on their needs. But you'd be wrong. Kubernetes doesn't reserve resources for components automatically (unless admins explicitly configure resource requests and limits).
After all, Kubernetes doesn't know what the apps that you deploy need to do. It can't tell the difference between a mission-critical application and a dummy app that your developers are just testing.
Thus, when a cluster runs short of resources, Kubernetes doesn't intervene by default. It can do things like restarting crashed pods to try to keep them running, but it can't step in and say "app Y is hogging too much CPU, so I'm going to redistribute the CPU to other apps" without using a feature of Kubernetes called preemption.
How Resource Quotas and Limits Enable Efficient Telemetry
However, there is a way to tell Kubernetes how many resources a given app or set of apps should be able to consume. You can do this by setting up resource quotas and limits.
Here's what each of these things does:
- A resource quota defines how many resources a namespace can use. In Kubernetes, a namespace is a virtual cluster that can host multiple apps. By setting a resource quota for a namespace, then, you effectively define how many total resources should be available collectively to the apps running in that namespace.
- Resource requests and limits define a range of resources that a specific container or Pod can consume. For example, you can use limits to tell Kubernetes what the minimum and maximum CPU resources should be for a given container.
The reason why resource quotas and limits help prevent the "noisy neighbor" problem described above is that they can prevent one application or set of applications from hogging resources and depriving other apps from operating normally.
For example, imagine that you have two namespaces in your cluster – one for production apps and one for apps that are in testing. Since the testing apps are not mission-critical, you could define a resource quota for their namespace that prevents those apps from consuming more than 20 percent of the total resources available to your cluster. Then, if a testing app were to experience a surge in telemetry operations, the resource quota imposed on the app's namespace would prevent the creation of additional pods that could increase resource utilization and risk destabilizing production apps running in the other namespace.
As another example, imagine that you run multiple production apps, but one is especially mission-critical. You could use resource limits to define a higher set of maximum resources that Kubernetes should make available to that app. That way, Kubernetes would prioritize giving resources to that app in cases where cluster resources are maxed out. Doing so might come at the expense of other apps, but it would ensure that the most important app has the resources necessary to operate stably. That's better than leaving it to chance to decide which app or apps will be able to function normally during times of resource shortages.
How to Set Up Resource Quotas and Limits
Configuring resource quotas and limits is simple enough. You include them in manifests when describing objects in Kubernetes.
For example, this YAML code (borrowed from the Kubernetes documentation) defines a ResourceQuota for a namespace named mem-cpu-demo:
Limitations of Resource Quotas and Limits for Telemetry
Overall, it's important to note that resource quotas and limits don't automatically eliminate the risks associated with telemetry data overload or other resource shortages in Kubernetes.
These features don't magically generate additional resources when your cluster runs short on CPU or memory. Even with resource quotas and/or limits set up, some apps may fail until cluster admins either add more infrastructure to the cluster or reduce the resource consumption rates of its applications.
However, resource quotas and limits will at least tell Kubernetes which workloads to prioritize during times of insufficient resources. They're a way to protect your most important workloads, ensuring that they can continue to manage telemetry operations and otherwise function normally.
How Telemetry Pipelines Keep Kubernetes Stable
Another way to prevent the noisy neighbor problem in Kubernetes is to offload the processing of telemetry data from the local cluster as much as possible. That way, the cluster resources are not used for processing telemetry data.
That's where telemetry pipelines come into play. By making it possible to transform, merge, deduplicate, or otherwise process data while it is in transit, telemetry pipelines reduce the data processing load placed on the workloads where telemetry data originates. In that way, they also reduce the risk that telemetry operations will cause your cluster resources to max out.
The bottom line: configuring resource quotas and limits is a best practice that will help keep your most important workloads operating during times of peak cluster load. But it's also a best practice to leverage telemetry pipelines to lighten overall cluster load, reducing the risk that Kubernetes will need to enforce resource maximums at all.