Running AI/ML on Kubernetes: production challenges and solutions

Dec 12, 2025

Guest:

Shyam Jeedigunta

In this interview, Shyam Jeedigunta, Principal Engineer at AWS, discusses:

The multi-layered challenge of ML in production - Moving from a working Jupyter notebook to production requires navigating infrastructure management, MLOps tooling, and multi-cluster orchestration
Essential tools and architecture patterns for production ML platforms - Recommending declarative configuration and GitOps practices, along with specific tools like Karpenter for node lifecycle management, Kubeflow/MLflow for ML pipelines, and specialized AWS services like EKS Auto for GPU workloads
As Kubernetes matures beyond foundational technologies, the focus shifts to hyper-focused solutions for heterogeneous environments, such as optimizing GPU node migrations and reducing training cluster downtime through advanced volume snapshot techniques

Relevant links

Transcription

Bart: Shyam Jeedigunta works for Amazon Web Services (AWS), a leading cloud computing platform. While the transcript doesn't provide specific details about his exact role, he appears to be a technical professional associated with AWS.

Shyam: I'm Shyam Jeedigunta. I've been a long-time contributor to Kubernetes and related projects for close to a decade. I work as a principal engineer at Amazon EKS and help build some of the largest, most reliable, efficient, and durable clusters on the planet.

Bart: And what are three Kubernetes emerging tools that you're keeping an eye on?

Shyam: It's been super exciting to watch new and emerging technologies at KubeCon. The ones I've been most closely following in my domains of interest include vLLM, a super popular inference engine with a lot of innovation deep in that stack. CAST AI is another project working on a product for optimizing costs, particularly for GPU and accelerated infrastructure. Job schedulers like Kueue and Slurm are projects that are really impactful for many customers, especially enterprises that run Spark jobs and training at large scale.

Bart: What have been some of the biggest challenges you've seen when teams try to run machine learning workloads on Kubernetes?

Shyam: The person I like to think about is the ML researcher who has something working on their Jupyter notebook on their laptop and now wants to actually run it in production. There are multiple layers that need to be navigated, with a steep learning curve for each.

You need to figure out how to manage your infrastructure: how to bring up the Kubernetes cluster with all the necessary bits for compute, storage, and network. Then you have to work at the next layer, bringing in ML capabilities and MLOps—being able to run pipelines, schedule jobs, and autoscale your infrastructure using tools like Kubeflow and MLflow.

If you're running multiple clusters across different regions, potentially across different accounts, and you have to orchestrate workloads across all of these, there's another added layer of complexity.

Fundamentally, the challenge is the need to have a vertically integrated stack that operates through many different layers to actually get to business value. This is one of the places where we need to invest more and improve standards for customers, reducing the undifferentiated heavy lifting.

Bart: Are there any tools, frameworks, or open source projects you've seen recently that you think are helping bridge the gap between machine learning and Kubernetes?

Shyam: There are awesome solutions for individual problems in this ecosystem. For instance, if you want to manage node lifecycle, auto-scaling, and upgrades, Karpenter is a great solution. It's something the community and our customers at EKS really love, and it lets you manage infrastructure, including GPU and accelerated compute, seamlessly.

For scheduling more sophisticated workloads where you need to provision memory, GPUs, other devices, and communication channels between computers, you can use DRA in EKS. However, there's a fair bit of configuration required to get it working. You need to install device plugins, drivers, and other components, which you don't have to manage if you run EKS Auto, where you get pre-built, pre-configured capacity that's directly consumable along with all dependencies.

For job scheduling, orchestration, and managing ML pipelines and experiments, there are numerous tools in the Kubernetes community. Kubeflow and MLflow are widely used, making it easy to declaratively specify ML pipelines and experiments. Similarly, multi-cluster schedulers like Kueue simplify job scheduling across multiple clusters without compromising efficiency.

Bart: If you were designing a production-ready AI ML platform on Kubernetes today, including training, inference, and auto-scaling, what specific tools or architecture patterns would you choose and why?

Key potential hyperlinks in this transcript:

"AI ML platform" linked to Kubeflow
"Kubernetes" linked to official Kubernetes documentation
"production-ready" could imply links to Amazon EKS or other managed Kubernetes services
"training" could link to Trainium
"inference" could link to Inferentia or vLLM
"auto-scaling" could link to Karpenter

Note: Since this is a transcript from an audio file with Shyam Jeedigunta from AWS, the context suggests multiple potential AWS-specific technologies might be relevant to the discussion.

Shyam: The first thing is that you should, as much as possible, try to do declarative configuration and GitOps-style practices. Make sure that what you deploy is reproducible, especially if your compute platform extends beyond the cloud to on-premises and edge locations. You need consistency and opinionated ways of launching clusters.

For instance, on EKS, you can use EKS Auto, which supports GPUs and other accelerator types like Trainium and Inferentia on AWS. You get out-of-the-box access to on-demand capacity, and it is elastic. You can choose to have static capacity if you have predictable jobs or training-style workloads.

You can implement container networking and access deep integrations like fast image pulls with Containerd using Seekable OCI. You also get access to technologies like multi-ENI pods, where AI/ML workload pods can use multiple network cards on underlying physical instances to unlock high-bandwidth and high packet processing performance.

Try to use well-paved solutions. There's still work to be done in the industry and in EKS to make this easier for customers and canonicalize these patterns with vertical integration.

Bart: Kubernetes turned 10 years old last year. You've been in the community for quite some time. What do you expect in the next 10 years?

Shyam: I keep thinking about this a lot, both in terms of the technology, projects, community, and my own career trajectory. Where should it be heading in the next few years?

I believe we are going to pivot from building foundational technologies and common denominator fundamentals that have been useful for many workloads. Now, we see a lot of specialization. There's significant heterogeneity in hardware, capacity, and environments where these workloads run. There is a need to make this easier for customers to consume across these heterogeneous environments.

It is important to invest in hyper-focused, piecemeal solutions that solve what may seem like esoteric problems. For instance, how do you migrate a GPU node to another GPU node as fast as possible? How do you take volume snapshots and potentially reduce the downtime of a GPU cluster used for training? There's definitely a lot of scope for such specialized solutions.

Bart: I notice that the transcript snippet is very short and lacks context. Could you provide more of the surrounding conversation or context about what Shyam Jeedigunta might be discussing "what's next" for? Without more information, I cannot confidently add meaningful hyperlinks.

Shyam: I'm going to be working and chipping away at these problems. I'm going to try to help as much as I can, both the open source community as well as at EKS, to make the lives of our customers easier.

Bart: And how can people get in touch with Shyam?

Shyam: You can reach out to me on LinkedIn, Kubernetes Slack, or wherever you can find me. My handle in most of these places is shyamjvs; on LinkedIn, it's shyamjeedigunta. If you find me at the AWS booth, in the open source project pavilion, or in one of the hallways, feel free to come say hi.