Kubernetes resource optimization: Tackling CPU and GPU waste through automation
This interview explores the critical challenges of resource optimization and cost management in Kubernetes environments, revealing common misconceptions that lead to significant waste.
In this interview, Andrew Hillier, CTO and Co-Founder at Densify, discusses:
Why Kubernetes overcommit works differently than VMware - explaining how the fundamental misunderstanding of resource allocation leads to low utilization and high costs
Practical strategies for reducing CPU, memory, and GPU waste - detailing how misconfigured requests and limits create stranded capacity with utilization often under 10%, and how GPU optimization can cut costs in half through techniques like MIG slicing for AI workloads
The critical role of automation in resource optimization - emphasizing that manual tracking of resource issues doesn't scale, and why automated solutions that provide clear reasoning behind configuration changes are essential for teams to trust and adopt optimization practices
Transcription
Bart: I'm Andrew Hillier, and I work for Densify.
Andrew: My name is Andrew Hillier. I'm the CTO and one of the co-founders at Densify, a Kubernetes optimization company.
Bart: I notice the transcript snippet is incomplete and lacks context about the three emerging Kubernetes tools. Without the full context, I cannot confidently hyperlink terms. Could you provide the complete transcript or more details about the specific tools mentioned?
Andrew: The biggest one for me by far is in-place resizing. I don't know if you call it a tool, but it's a capability. We do a lot of work on optimizing requests and limits, and automation of that, and it's a godsend because we can do what we do much faster and more easily.
We keep an eye on things like OTEL because we're always hungry for data and better ways to get data and avoid previous issues. We're focused on getting the data, analyzing it, automating it, and optimizing the environments.
Of course, Karpenter is a big one for us as well. It's not new, but it's a very nice complementary technology that automatically reduces costs when you optimize the containers.
Bart: Now, one of our podcast guests, David, thinks that developers didn't fully understand the fundamental differences between containers and virtual machines. How do you help teams understand containerization when they're coming from VM backgrounds?
Andrew: That's a topic near and dear to my heart because we have a background in VMware optimization. In VMware, we had overcommit, and the resources were virtual. If I didn't have good utilization of my hosts, I could overcommit the CPU and memory, giving away more than once and fixing it centrally.
I think people go into Kubernetes believing it does the same kind of overcommit because it talks about overcommit. However, overcommit in the Kubernetes world is different—it means going above your request but below your limit. You're not scheduling the same resources more than once.
As a result, you get stuck in a low utilization world, and the only way out is to fix your requests and limits. People have a misconception that there's some magic that will improve utilization. In reality, if you ask for 10 CPUs, Kubernetes will earmark 10 CPUs for you. You can share them with other containers, but when you give out all the CPUs, you start a new node. This makes it very expensive and leads to low utilization.
The VMware technology with virtual resources and overcommit is quite different and actually quite useful for driving efficiency.
Bart: One of our other guests, Zain, observed there's a lot of waste happening on CPU and GPU, and there's a significant opportunity to optimize this. What approaches do you take to reduce resource waste in your clusters?
Andrew: CPU is universally too big. We find it's difficult for an app team to estimate what requests and limits to give something. You usually want to request something related to when you're busy or what you need to do, but if everybody does that, you end up with a lot of stranded capacity. We see tons of stranded capacity, a lot of capacity on the floor, with utilization under 10 percent on average.
GPU is a new area. We just deployed our new GPU optimization and are hitting NVIDIA GPUs in customers. There's a lot of opportunity and capital, especially since we're seeing cases where people don't even have the DCGM exporter installed, so they're not seeing what they're doing.
You'll see a GPU and AI inference workload using a quarter or an eighth of a GPU at peak. There's an opportunity to do MIGs and slicing, dividing them up on the right GPUs. With one recent customer, we could chop the $4 million spend to $2 million by just slicing up the GPUs.
It seems to be trending like CPU, where workloads ebb and flow, making it hard to determine what you need. Memory is usually more flat, so people overestimate and create a ton of waste. With GPUs being so expensive, it adds up really quickly.
Bart: One of our other guests, Grzegorz Głąb, said that using managed Kubernetes services like AKS makes things easier because it takes a lot of work to set up and operate a cluster. However, it also comes with its own set of challenges. What challenges have you faced with managed Kubernetes services?
Andrew: From a configuration management perspective, it makes it a lot easier when everything's bundled and stands up together. However, from our perspective, it doesn't solve the problem of infrastructure overprovisioning. If people make their request too big, it'll automatically run on a lot of infrastructure and cost a lot of money.
I should mention that if you make memory limits too small, you'll get out-of-memory kills. We do a lot of analysis around that. In our view, after analyzing across our customers, there's very little difference between homegrown solutions or services like EKS, AKS, or OpenShift. They all exhibit the same challenges from a capacity perspective.
If your CPU requests are too big, you'll just have large scale groups or scale sets running. Even running Karpenter doesn't solve this. If you ask for twice as much CPU as you need, Karpenter will just run twice as many nodes or node CPUs because it has to give you what you asked for.
Karpenter is extremely convenient because it does auto consolidation and manages things very nicely, but it doesn't solve the spend problem. If app teams are asking for too much, it still has to abide by those requests. None of these technologies solve that problem, but they're obviously very useful for making lives easier for the platform team.
Bart: Every month, we do a content analysis report where we look at all the different content shared across our network. A couple of months ago, the top trending topic was resource optimization, looking at cost management, performance tuning, and scale-to-zero solutions. In your experience, what have you found to be topics that get the highest amount of response from people in the community? Obviously, everybody wants to save costs and optimize, but when it comes to specific topics or things people are having difficulties with, what challenges do you see in the community's response?
Andrew: In the cloud world, spend and resource optimization have always been the biggest challenges. To this day, getting people to change cloud instances is difficult because there are many configured elements you want to get right.
In Kubernetes, it's much more streamlined because it's basically about requests and limits. Nobody's emotionally attached to these settings as long as they understand why they're set the way they are. The biggest improvement we see is automation. We've been doing a lot of work in this area, and it's being adopted quickly because people want to understand the reasoning behind settings and then simply implement them.
People don't want to need a PhD in analytics—they just want to turn on the machine and have it work. We have a visualization in our Densify product where you can visually see if resources are too big, too small, or just right. You can almost dynamically see improvements in the environment when you turn on automation.
Addressing risk is also key. If things are hitting memory or CPU limits, or are undersized, automating the fix is crucial. You don't want to manually track these issues—just let the machine resolve them.
At trade shows, when we say "cut costs in Kubernetes," we get a lot of reaction. Conversations quickly move to automation: people want to fix the issue and then move on to their next problem. That's great for us because once users trust the solution, they just implement it and don't worry further.
Looking ahead, AI is a standard focus. We've just released GPU capabilities targeting AI inference running in Kubernetes. We're also integrating AI within our product—developing intelligent notifications and advanced analytics to help users quickly identify their most critical issues or top cost areas.
Bart: Kubernetes turned 10 years old last year, and the CNCF is turning 10 years old this year. If you could go back 10 years in time and give yourself one piece of advice about starting with this technology, what would that be and why?
Andrew: That's a great question looking back in time. A couple of things come to mind. Everything takes longer than you think it's going to take. 10 years sounds like forever, and at any point during that time, you probably thought the world would be transformed in the next year.
We see the same problems take a long time to resolve. Even in our work, it's taken 10 years for people to truly realize that misconfiguring Kubernetes can be very expensive and involves significant risks. People want ecosystem products like ours from Densify to solve their problems automatically, and that takes time.
The key is to be patient. There's no magic solution that solves all your problems. It just presents the same challenges in a different way. You still have spend and performance problems that need solving. It's important to keep focused because it's taken everyone a while to realize this, and these insights could have been recognized sooner.
There's a lot of spend that could have been avoided over the years. However, now is still a good time to solve these issues because we're seeing Kubernetes ramping up almost everywhere. The best time would have been a few years ago, but now is the second-best time to fix it.
Bart: And with that in mind, Andrew, what's the best way to get in touch with you?
Note: In this specific transcript, there are no technical terms that require hyperlinking based on the provided LINKS table. The text appears to be a closing conversational question about contact information.
Andrew: I'm findable on LinkedIn, and our website is a great source with many good resources. You can find me by clicking the right places. Additionally, we have a cool sandbox on our website where you can explore our product and click around. You can see the visualization of resources in the Kubernetes environment. It's a rich experience that allows you to use our product against demo data. If anybody wants to see more details about what I've been describing, they can simply go in and interact with it.
Bart: Perfect. Andrew, always a pleasure to talk to you. Take care. We'll speak soon.
Andrew: Thanks, Bart.
Bart: Thanks.