Emerging Kubernetes tools and operational patterns for modern workloads

Dec 12, 2025

Guest:

Sai Vennam

In this interview, Sai Vennam, Principal Solutions Architect at AWS, discusses:

Three emerging Kubernetes tools to watch: Kueue for intelligent GPU job scheduling beyond first-come-first-served, Karpenter's node overlays for customizing instance selection based on organizational discount plans, and Kro for infrastructure-as-YAML composition without custom controllers
Moving operational tasks like cluster upgrades, kube-proxy, and CSI driver management to cloud provider control through services like Amazon EKS Auto Mode, allowing platform teams to focus on business-critical work
Combining Karpenter's intelligent provisioning, Kueue's job queuing, GPU time slicing, and MIG capabilities to dramatically improve GPU utilization rates and cost efficiency for training workloads that don't require low latency

Relevant links

Transcription

Bart: Sai Vennam works for Amazon Web Services (AWS), and the transcript appears to be a standard introductory question about his professional background and role.

Sai: My name is Sai Vennam, and I'm a Principal Solutions Architect at AWS. I work with customers who are migrating and modernizing workloads, moving to Kubernetes. I collaborate with practitioners and enjoy creating digital content that helps people learn Kubernetes. I've been doing this since 2017-2018, initially on the IBM technology channel and now on "Containers from the Couch". Creating content and seeing people learn new technologies is what excites me.

Bart: And what are three emerging Kubernetes tools that you're keeping an eye on?

Sai: One of the tools I'm excited about is Kueue, which allows operators to schedule jobs to take advantage of GPU resources. This is great because GPUs are hard to come by and expensive. These workloads don't necessarily need to be low latency. Kueue allows you to queue jobs instead of Kubernetes' typical first-come, first-served pod scheduling, which isn't always ideal.

Another capability is Karpenter, which has been around for a while and is already used in production. I'm particularly interested in its new features, specifically node overlays. Karpenter spins up instances in response to unschedulable pods. With node overlays, you can change the metrics Karpenter uses to define those nodes. For example, your organization might have certain discount plans on instances or want to favor specific instance types, and that's exactly what node overlays enables.

The last tool to keep an eye on is Kro. It's a composition infrastructure-as-YAML approach to automating infrastructure. Kro abstracts complex Kubernetes resources—from simple deployments to custom resource definitions—without needing custom controllers. It includes built-in dependency graphs and operates in a Kubernetes-native approach. I think we'll see more people using Kro to orchestrate not just Kubernetes workloads, but also to combine cloud service deployments with custom controllers.

Bart: In Kubernetes, which parts of cluster operations truly need to stay in the hands of platform teams? And which could realistically be abstracted or automated without losing control?

Sai: At AWS, we have this concept of undifferentiated heavy lifting. This refers to work that's uninteresting or, as another term suggests, plumbing—tasks everyone needs to do and maintain. In the world of Kubernetes, these are operational tasks like cluster upgrades and keeping systems up to date with the latest Kubernetes versions.

These components can be moved to cloud provider control. Amazon EKS Auto Mode is a capability that allows you to focus on what really matters by letting the cloud manage critical components like kube proxy, VPC CNI, certain controllers like the Karpenter controller for storage, and the EBS CSI driver. While this might sound like acronym soup, these are components every cluster needs. So why spend time maintaining them when a cloud provider can take care of it?

Bart: How do AI ML workloads expose the limits of today's scheduling and capacity planning models, especially when GPU, ephemeral jobs, or burst demand are involved?

Sai: What was that first question again?

Bart: In Kubernetes, which parts of cluster operations truly need to stay in the hands of platform teams? And which could realistically be abstracted or automated without losing control?

Sai: This goes back to the challenge with GPUs: you don't have an endless supply of them. Especially with AI/ML workloads that don't always need to be low latency, particularly training jobs, Karpenter comes into the picture. There are certain GPU-specific capabilities like MIG, time slicing of GPUs, and Kueue to allow queuing up jobs. A single GPU can be utilized at a higher percentage. That particular metric, utilization, is critical here. All of these tools come together to help businesses more efficiently use their GPUs and bring their utilization rate up.

Bart: When clusters scale up easily but rarely scale down or rebalance efficiently, what patterns or mechanisms could improve cost and utilization without manual intervention?

Potential solutions could include:

Karpenter for intelligent node provisioning
Kueue for job queueing and resource management
custom resource definitions to create custom scaling logic
Amazon EKS Auto Mode for automated cluster management

The key is to implement intelligent scaling mechanisms that can dynamically adjust infrastructure based on workload demands while optimizing cost and resource utilization.

Sai: The answer here is Karpenter. Karpenter allows you to spin up the most cost-efficient instances in response to workloads that need to be scheduled. When those workloads scale down, it'll efficiently bin pack them as well.

A simple example I always give is that at peak traffic, you're using three nodes at 80% utilization. As traffic starts to dip, those nodes might be at 20% utilization across three nodes. With bin packing, you can run all those workloads on one node, getting it up to 60% utilization instead. You might use a smaller instance, Graviton or ARM-based processors, or spot instances to further save on cost. These are all the things that Karpenter helps with, which are critical in the space right now, especially with GPUs.

Bart: Looking towards the future, we've spoken a bit about AI. We just celebrated 10 years of Kubernetes. What can we expect in the next 10 years? AI, other technologies? What do you think, Sai?

Sai: When Kubernetes was emerging, it was really fun for me to make content and teach folks about Kubernetes. Now we're seeing more maturity build, and practitioners are implementing more complex patterns. We see it even here at KubeCon with conversations moving up to 400 or even 500 level, and the talks becoming more advanced. As practitioners start to implement Kubernetes best practices and move to scale, that's what I'm excited for in the next 10 years—working with people and helping them reach that level of scalability.

Bart: What's next for you, Sai?

Sai: I've been building content and helping folks learn the basics of Kubernetes. I think it's time to level up and go even deeper. One of the workshops that I maintain, eksworkshop.com, is where we're centering a lot of these new patterns and capabilities that you can do with Kubernetes. Check it out at eksworkshop.com. This is part of what's in store for me as I move forward, helping practitioners build more advanced capabilities on top of Kubernetes, leveraging what they need to really grow in scale and maturity.

Bart: How can people get in touch with you?

Sai: Get in touch with me on LinkedIn. I'm very responsive there. Reach out—Sai Vennam—on LinkedIn. If you want to see content, workshops, and videos, you can always catch Containers from the Couch on YouTube.