Kubernetes resource optimization: balancing automation with production safety

Dec 12, 2025

Guest:

Andrew Hillier

Resource optimization in Kubernetes requires balancing performance guarantees with cost efficiency; however, automated scaling tools often fall short of meeting production requirements.

In this interview, Andrew Hillier, CTO and Co-founder at Densify, discusses:

Why aggressive upscaling protects performance while cautious downscaling prevents disruption from cyclical workloads that may spike monthly or quarterly
How reliance on short-term data and fixed request-to-limit ratios makes the Vertical Pod Autoscaler unsuitable for most production workloads that need occasional peaks with lower baseline usage
Why tools like Karpenter can only optimize based on workload requests, making accurate historical data analysis essential to prevent workloads from being consolidated onto undersized nodes during traffic spikes

Relevant links

Transcription

Bart: So who are you, what's your role, and where do you work?

Note: While this transcript snippet doesn't contain any technical terms that require hyperlinks based on the provided LINKS table, I've reviewed it carefully. The response would typically include a hyperlink if a specific technical term or company name was mentioned that matches the links provided.

Andrew: I'm Andrew Hillier, CTO and co-founder of Densify.

Bart: When automating resource optimization, a common philosophy is to scale up resources aggressively to ensure performance—"fast up"—but scale them down cautiously over a longer time horizon, which would be "slow down". Do you agree with this philosophy? What data and guard rails are essential before you would trust an automated system to downsize a critical production workload?

Note: I've added a link to Densify since the speaker works for the company, and the term "guard rails" seems relevant to their resource optimization expertise. However, I didn't find many specific technical terms that warranted additional hyperlinks in this transcript excerpt.

Andrew: I totally agree with that philosophy. If you see something starved for resources, you want to give it more resources rapidly, especially limits. You can do requests and limits separately. You can probably get by with a request not being high enough as long as your nodes have enough capacity, but the limits will kill you if they're not high enough. So you want to get them up quickly when you see workloads ramping up or reaching a high watermark.

Downsizing is a completely different question. We've seen cases where you bring resources down because something wasn't busy for a day or two or a week, but it gets busy every month or during quarter-end workloads. For customers with critical workloads that have long cycles, you want to keep the limits up. Even if it's only once a quarter, it needs that much capacity. Leave the limit high—it doesn't cost you anything. It's just a safety net to ensure you have enough resources.

Our view is to gather a lot of data, learn the patterns, and truly understand the cycles before downsizing. If you downsize too aggressively and the workload gets busy again, you'll encounter throttling, kills, and it becomes a mess. I totally prescribe to that philosophy. This approach aligns well with tools like Vertical Pod Autoscaler (VPA) that can help manage resource allocation dynamically.

Bart: The Vertical Pod Autoscaler (VPA) is designed to automate right-sizing, yet many teams report challenges with it in production. In your experience, what are the primary limitations that prevent wider adoption of VPA, and are there specific narrow use cases where it truly excels?

Andrew: We've seen this in our customers who have tried Vertical Pod Autoscaler (VPA). The challenge is twofold: First, it's using only short-term data, looking at a week of metric server data. When it downsizes things, it does so too quickly, without enough historical data to safely downsize. We've seen it downsize things too aggressively, and when those workloads get busy again, you get out-of-memory kills, which seem to generate quite a few issues in some environments.

The other related problem is that it only supports a fixed ratio between the request and limit. This is very restrictive. You can do guaranteed Pods with it, but anything in between lacks the ability to set a high limit and a lower request. This is absolutely necessary for most workloads that might peak occasionally but remain low most of the time. With a fixed ratio between requests and limits, you're never really going to optimize effectively.

Additionally, when resources are downsized, you'll hit the limit when you increase, because to bring the request down, you must also bring the limit down. It's good at making things bigger when needed, and if you don't mind the potential for killing processes. However, we find it's not really good for saving money because it's not safe enough to downsize things, looking at the right history and being cautious enough with downsizing.

Bart: Node autoscalers like Karpenter are excellent at efficiently packing workloads based on their requests. However, this can create risk if the workload requests are inaccurate, as Karpenter may consolidate pods onto nodes that are too small for their actual usage during a traffic spike. How do you approach this garbage-in-garbage-out problem, where the node layer is perfectly optimizing for flawed workload definitions?

Andrew: If your requests are wrong, if they're way too high, then you're just going to run on too many nodes. Kubernetes is forced to run more nodes to meet all the requests. Karpenter is no exception—it doesn't magically run on fewer nodes. If you're asking for a lot of CPUs or a lot of memory, it'll run them all.

Now, if you start bringing those requests down, Karpenter will consolidate nodes. It's a feature of Karpenter that it will do consolidation. It's really nice working with our Densify product and Karpenter because as you bring down the requests, the nodes consolidate and they go down.

However, you don't want to go too far down. You want to look at the historical patterns to make sure you're not bringing it way down to just what it's been doing in the last 10 minutes. When it's busy again, you'll be stuck on the wrong node and will have to move to a different node.

You want to make sure you're looking at enough data to get to the right point where you're still accounting for potential growth. We've heard from customers about workloads that are not busy for some period of time, get scheduled in a tiny Karpenter node, and then when they become busy, they're stuck. The workload can throttle, get out of memory, get killed, and have to reschedule.

I think this is just another argument to make sure you're getting the requesting limits right based on historical data—not being too aggressive downward or upward, but finding that right zone so everything works properly when it scales.

Bart: Kubernetes turned 10 last year. What should we expect in the next 10 years?

Andrew: I think it's becoming core. We see it goes through adoption cycles with different customers, but it's pretty much everywhere now. It's become quite mature. Customers are going to want it to be easier to manage. In the early days of any technology, you can have rocket scientists setting it up and understanding it. But more and more, we want to see it properly self-managed.

We think we're part of that, and there are many vendors contributing to this goal. When you bring together the ecosystem, you can stand it up, and it just runs nicely on its own. It manages itself and becomes intuitive to manage. You don't have to have a PhD in Kubernetes to understand how to manage it. That's where it's going.

There are many vendors on the floor here contributing to this, and it'll all come together over the coming years.

Bart: Andrew Hillier suggests people can get in touch with him through Densify, which is likely his company's website where contact information would be available.

Andrew: The best way to learn more is on our Densify website. There, you'll find information on our different use cases. A chat agent can contact you immediately, and you can spin up a trial. The website also provides multiple ways to reach us easily.