Kubernetes resource optimization and the future of AI workloads

Guest:

Andrew Hillier

From resource optimization to AI workloads on Kubernetes, this interview explores the challenges of efficient container management.

In this interview, Andrew Hillier, Co-Founder & CTO at Densify, discusses:

Resource optimization challenges in Kubernetes environments, noting that "over-provisioning is quite rampant"
AI workloads on Kubernetes and the specific challenges of GPU optimization, highlighting that "cost is a problem, and when GPUs are involved"
Automation approaches for resource management

Relevant links

Transcription

Bart: I'm Andrew Hillier, and I work for Densify. My role is focused on helping organizations optimize their cloud and Kubernetes resource utilization.

Andrew: My name is Andrew Hillier. I'm the CTO and one of the co-founders of Densify.

Bart: What are three Kubernetes emerging tools that you're keeping an eye on?

Andrew: We're keeping an eye on all things around vertical scaling, like in-place resizing, which is of great interest to us. We're waiting for it to come out of alpha and beta. The multi-dimensional pod autoscaler is particularly interesting since we are in the business of resource optimization. Anything that improves downstream integrations is great, especially with Prometheus or OTEL, which is how we get our data to analyze. We continuously monitor these advancements to make everything work better.

Bart: I noticed that you've provided guidelines and a links table, but no actual transcript text is present. Could you share the full transcript so I can apply the hyperlinking guidelines?

Andrew: We find over-provisioning is quite rampant. There are several reasons for this. As an app team, you might be disconnected from the infrastructure and not really understand that requesting 20 CPUs will actually cost something somewhere. This will manifest itself way down the line when somebody receives a bill for a giant scale group or a bunch of Karpenter nodes.

We find it's rampant, but at the same time, there's also some under-provisioning. Memory tends to be both over and under-provisioned—it's a bit of a mess. We try to surface that. Our big strategy is to be very detailed and meticulous in our analytics so we can say, "You should be smaller, and here's exactly why."

We've watched you for 95 days. Here's the biggest you ever got. Here are all the parameters. This is a safe thing to do. The first step is always to gain that trust so somebody can be comfortable downsizing something they're not fully using.

The second step is to automate it to ensure it happens automatically because app teams don't want to be in the business of resource management. Once they trust the process and turn it on, the problem goes away.

Bart: Our guest expressed that Kubernetes is a platform of the future for AI and ML, particularly for scaling GPU compute. Do you agree with this assessment? What challenges do you see in running AI workloads on Kubernetes?

Andrew: We see AI being a big use case. More customers are using inferencing on containers, with some training as well. Customers are using some as-a-service offerings, but many workloads are landing in Kubernetes. Cost is a problem, and when GPUs are involved, it becomes even more expensive, especially if they're not fully utilized.

We're doing extensive R&D to optimize GPU yield, which can mean different things depending on the workload type. For training, you might want to maximize utilization at 100% all day long, but for inference, that's not ideal—you want better response times. We're working to understand the use case: Is it data preparation, inference, or training? We aim to establish policies that optimize resource use by containers.

Various levers are available. GPUs have requests and limits, and the starting point is ensuring these are correctly set to avoid reserving excessive capacity and wasting GPU resources.

Bart: Our guest expressed that having an automated mechanism is better than enforcing processes. What automation tools or approaches do you recommend for managing Kubernetes resources?

Andrew: Cloud optimization is very process-centric. You open change management tickets to do that. In containers, we see much less of that. Not everyone opens a change ticket to change a container; they want to automate it. Because it doesn't necessarily change the storage type or other configuration parameters of the workload, changing resource settings is usually pretty safe.

We have several mechanisms to do this. We have a Terraform module and a powerful API that returns native JSON or tfvars, which you can integrate with OpenShift and other platforms. One of our latest offerings, which we're showcasing at the show, is our mutating admission controller. We've made major advancements to it, and it's releasing this week.

It's one of the nicest solutions because it provides a centralized control point over request and limit settings. Rather than having to go upstream to all repositories and change numbers, you can override them on the fly for the right workloads. If it's a dev workload or you have permission, you can fix settings as it deploys. It's a clean way to stop overprovisioning by automatically bringing those numbers down. You'll continue running fine—we monitor all the behavior—and it prevents a lot of infrastructure from running unutilized.

Bart: Kubernetes turned 10 years old last year. What should we expect in the next 10 years?

Andrew: I think it's interesting. Who knows where things are going, but the trend I see reminds me of the early days of VMware. In those early days, everybody had a VMware guru when deploying these technologies—someone who really understood it. As it grew and became ubiquitous, you couldn't rely on that skill set being everywhere.

I think as Kubernetes grows, the management ecosystem needs to grow with it, just like in the VMware environment, where a strong management platform addressed all the pieces. Different vendors do part of it, and it all needs to be there so someone can run these things without waking up all night and without having a PhD in Kubernetes.

People just need to be able to run it reliably, learning from what everybody else has done and the mistakes that have been made. We try to do our part to ensure resource problems are solved through automation. The goal is to make Kubernetes much more mainstream and not require everything to be a science project—just stamp out what works by knowing what's happened in the past.

Bart: I notice that the transcript snippet is very short and lacks context. Could you provide more of the surrounding conversation or the full context of the question "What's next for you?" to help me accurately apply the linking guidelines?

Andrew: The big thing we're focusing on is AI. We're working on automation with several major projects in our roadmap. AI is our primary focus because we believe it will become a major source of cost and performance concern. Many of our customers see this as their big frontier, running workloads both on-premises and in the cloud. In the coming months, we'll be pushing hard to optimize the cost and risk of running these AI workloads.

Bart: How can people get in touch with you? What's the best way to contact you if they have questions?

Note: No specific hyperlinks were required for this transcript segment, as it does not contain technical terms or references that match the provided links. The text appears to be a generic contact inquiry question.

Andrew: The best way to learn about our product is to visit our website, Densify.com. You'll find my contact information there. We also have a new sandbox environment where you can request a trial or simply click around to try it out. This is a good way to get to know what our product does. You can contact us directly, but the starting point is Densify.com, where you can find everything you need.

Podcast episodes mentioned in this interview

Configuring requests & limits with the HPA at scale
with Alexandre Souza
Saving 10s of thousands of dollars deploying AI at scale with Kubernetes
with John McBride