Resource management in Kubernetes: from copy-paste to automation

Guest:

Yasmin Rajabi

A practical discussion on managing Kubernetes resource optimization and the challenges of balancing cost efficiency with reliability.

In this interview, Yasmin Rajabi, COO and Head of Product at StormForge, discusses:

How the default 60% HPA target utilization stems from Kubernetes documentation examples, leading to systematic waste across clusters, and why increasing it is a safe first optimization step.
The resource management challenge in large clusters, where copied configurations and arbitrary safety buffers compound into significant waste, particularly when managing thousands of workloads.
Why automated resource management with machine learning helps bridge the gap between platform teams focusing on cost optimization and developers prioritizing reliability.

Relevant links

Transcription

Bart: So, tell me about who you are, your role, and where you work.

Yasmin: My name is Yasmin. I am the COO and Head of Product at StormForge. I've been at StormForge for a handful of years now. Prior to that, I was at Puppet for a while. I've been in the infrastructure automation space for most of my career and now I'm doing it in Kubernetes. It's been a lot of fun.

Bart: We want to take a closer look at some of the topics that come up in our podcast. One of the topics is overprovisioning. Our guest, Alexander, wrote an article about overprovisioning. What strategies have you found to be effective in controlling overprovisioning in large Kubernetes customers?

Yasmin: One of the pieces in that article I really agreed with was that increasing your HPA (Horizontal Pod Autoscaler) target utilization as a general rule is a good first step. What I see is that a lot of people use something like 60% for their target utilization. That's literally telling Kubernetes to waste 40% of your CPU requests and limits. Do you actually know why most people set it to 60?

Bart: The transcript was autogenerated from an audio file.

Yasmin: Where is it? That's because the example in the Kubernetes docs uses that value. Everywhere I go, people have set their target utilization to 60%. I'll ask them and rarely do people have a good answer for why they did it. They'll say, "It's in the docs," or "Honestly, it was set for other workloads, so that's what we set it to." For people who haven't embarked on any type of optimization and have low cluster utilization, raising the target utilization is low-hanging fruit. Most of the time, people are pretty generous with their CPU requests and limits, and it's an easy way to optimize by increasing that target utilization. This will increase your CPU utilization pretty easily and safely, as you have set those generous limits, allowing you to burst above your request as needed.

Beyond setting your target utilization, I think it comes down to the type of app. For static apps, it's easy enough to look at the Grafana dashboard, see what the application actually needs, and ensure that whatever buffer you're adding isn't leaving too much waste. Just being diligent about that is key. For dynamic apps that fluctuate during the day or have more downtime on weekends, it's a bit trickier. It's a mix of determining the right setting for the right time and having the automation to make sure you're changing those settings at the right time. Knowing what usage patterns to expect, such as needing fewer resources on weekends, and deploying changes to your CPU requests and limits to lower your requests when the workloads don't need it is crucial.

Bart: Take that a little bit further because you're shedding some light on the fact that a lot of people are doing this because they may not even know, or they're just following along because it's in the docs. You've talked to a lot of people about optimization. What's the number one mistake that you see people making when it comes to that? Because who doesn't want to optimize? But what are people getting wrong about optimization?

Yasmin: When I ask people how they set their CPU requests and limits and Memory requests and limits, they often say the application had it, and that's why they use it for this app. I respond that it's a different app, and ask if it has the same profile and resource needs. They often reply that they copied the template and used it. The biggest issue I see is that people are in a rush to quickly get things out, and resource management usually comes at the end. In an effort to get the application deployed, resource management tends to be an afterthought.

Bart: Meet the needs of the business people, kind of skip that step, and think about what the CPU requests and limits and Memory requests and limits are for this workflow. Another thing that came up in the interview that we did with Alexander was about governance requests and limits. He stated that being conservative on requests can be painful in large clusters. How do you approach resource allocation in environments with many uncontrolled work?

Yasmin: We see this a lot, and the approach can add up quickly. I mentioned the safety buffer - everyone's setting those safety buffers to ensure there's a little bit of give on all their workloads. This works fine when it's just a small amount of workloads, but on larger clusters, it adds up and becomes a big problem. I think part of this is because people are looking at their own app, and when it's your own app, being a little over-provisioned is fine - it's just a little, and you have that safe buffer. However, when you're on the other end and looking at the entire cluster, all those little safety buffers start adding up, and your utilization gets into the teens or even single digits. Even worse, we talked about what the biggest challenge is - those safety buffers people are adding are guesses or just copy-pasted from previous applications. You're adding guesses to something that probably already has enough of a buffer. Again, the waste adds up, and you need governance. This is easier said than done. I think platform teams are usually the ones put in the seat, tasked with putting governance in place. But the incentives don't necessarily match with the developers, because as a developer, why would you not add a safety margin? You're the one who's going to get the call in the middle of the night because your app ran out of resources, it started booming, or whatever happens. So those incentives don't match. The platform teams want to have a more efficient cluster, and they're getting pushed to keep costs down and improve utilization. But app teams want to drive uptime or maintain reliability, and these things are often perceived as being at odds with one another. Like, if you're going to reduce resources, then something bad is going to happen, even though you could probably reduce resources and everything would be fine. When push comes to shove, people always choose reliability over cost improvements, at least what we've seen. When they can't make that decision, I would say that you can make that decision because you don't have to choose between the two if you use software. But when people are doing more DIY, then they have to choose. Before I worked at StormForge, I worked at Puppet, as I mentioned, and lived and breathed the DevOps problem. This is the same thing, just applied to a new set of technologies. You need alignment between teams, and you need to break down those silos. I think it's most effective to do this governance through software because then you get that unbiased third party empowering both sides.

Bart: Interesting, right? Using software avoids the tendency to fall into conflicts where there might be doubts around ownership. As you also mentioned, getting buy-in is crucial. Without it, there's going to be automatic resistance. Much of what we're discussing is about change management. Those are great insights to keep in mind. You mentioned resource management; our guest Alexandra also expressed that having an automated mechanism is better than enforcing processes. What automation tools or approaches do you recommend for managing Kubernetes resources, such as CPU requests and limits and Memory requests and limits?

Yasmin: I've been trying to avoid being biased, but from my point of view, the only answer here is automation. I would recommend our product, StormForge Optimize Live, but the same rule can apply to anything that isn't cost optimization. Having automated enforcement using a tool to handle the toil is always going to be better than humans because, as you mentioned, the processes are just not scalable, and you don't want the blame game. You want tooling to solve this.

Because I'm in product, I get to talk to a lot of users, and we talk a lot about how they've either attempted to solve this problem using automation tools, have already solved it, or are thinking about solving it. Everyone is at different stages, and the answer is always an automation tool. However, in some cases, people go the DIY approach of creating their own automation internally, which doesn't always have to be something you purchase.

One of the challenges of the DIY approach is that it's not very hard to set CPU requests and limits and Memory requests and limits, which are just two settings. You're counting requests and limits, which are four settings, and that's fine for a few workloads. However, for a thousand workloads where those settings are changing every day, that's where an automation approach is necessary. Because I work at StormForge, I'd argue that you need machine learning and automation. Machine learning can come up with the right setting for the application and specific workload at the right time, and then the automation can actually go out and change that setting.

You also need to layer on things that make your life easier. We talked about getting buy-in from developers, and you want to have those constraints. The automation tool should have the ability to set constraints and say, "I never want any of these applications to have a CPU set less than 100 millicores," or "I always want to have my requests equal to my limits." You need that flexibility in the automation tool because it needs to be easy to interact with, easy to use, and easy to deploy.

When I say that, I mean working with tools like Argo. Everyone's using some type of GitOps tool. The automation tool you use must just work, or when you roll things out, have a little bit of intelligence to say, "Okay, I don't want to just roll out big changes all across my environment all at once. I want to roll out smaller changes, maybe, and make sure they're at least big enough that it's worth that kind of churn in my environment." When you're using an automated tool like that, it's all hitting your API and making those changes, so you want to make sure that's doing it in a scaled-out approach.

Bart: Now, bonus question: the role of our providers when it comes to optimization, are they getting in the way or enabling? What's been your experience? Any recommendations for folks out there that might be wrestling with this question?

Yasmin: It's funny, we partner with AWS because of Karpenter. We love Karpenter, and you'll see us give talks on it. We do pod optimization, and they do node optimization. You need both to actually cut down on your resources. People often ask, using AWS as an example, why AWS wants them to reduce their resources. Why are they investing in Karpenter and partnering with us? I think they understand that as people grow, they want to make sure they're putting their investments in the right thing. Maybe you're cutting down on your Kubernetes infrastructure costs, but then you can repurpose that towards something more innovative, like ML/AI workloads that you're looking to innovate on. I think AWS sees that and wants to be the platform for people to do all of their application deployments and management, and have the answers, without being the one that you have to spend the most amount of money on just your infrastructure, so you can spend on the things that matter.

Bart: Yasmin, thank you very much for sharing your insights today. I'm looking forward to connecting with you directly at KubeCon. Thanks a lot.

Yasmin: Awesome, I'm looking forward to it.

Podcast episodes mentioned in this interview

Configuring requests & limits with the HPA at scale

with Alexandre Souza