Kubernetes for AI workloads: emerging tools, GPU challenges, and community
Kubernetes continues to evolve rapidly to meet the demands of modern AI and machine learning workloads, presenting opportunities and challenges for the community.
In this interview, Abdel Sghiouar, Senior Cloud Developer Advocate at Google, discusses:
Three emerging Kubernetes tools worth watching: Kueue for fair GPU and CPU resource sharing between teams, Multi-cluster Orchestrator (MCO) for cross-region workload management, and the Gateway API Inference Extension designed specifically for AI inference workloads
Key challenges in running AI workloads on Kubernetes, including GPU availability and cost optimization (whether to pre-provision expensive GPUs or risk unavailability with auto-scaling), maximizing resource utilization, and improving monitoring and observability for specialized hardware like GPUs and TPUs
The need for greater industry collaboration in Kubernetes development, emphasizing that more cloud providers should contribute resources to standardization efforts rather than building divergent specifications
Relevant links
Transcription
Bart: I'm Abdel Sghiouar, and I work for Google. My role is to work on cloud native technologies and Kubernetes.
Abdel: Hi, I'm Abdel Sghiouar, I'm a cloud developer advocate at Google, focusing on Kubernetes.
Bart: Three Kubernetes emerging tools that Abdel Sghiouar is keeping an eye on would benefit from some context and hyperlinks:
Kueue: A Kubernetes native job queueing system that helps manage batch and machine learning workloads.
Gateway API: A Kubernetes project that provides a more expressive, extensible, and role-oriented way to configure network connectivity in Kubernetes clusters.
Multicluster Orchestrator (MCO): A tool from Google Cloud for managing workloads across multiple Kubernetes clusters.
Note: Since the original transcript is very brief, I've expanded on the context while maintaining the core information and adding relevant hyperlinks from the provided LINKS table.
Abdel: Kueue is a Kubernetes native job queueing system, which allows you to share Kubernetes resources, especially GPUs and CPUs, fairly between teams. The second one is something we announced at KubeCon called MCO, Multicluster Orchestrator, which is a tool to orchestrate capacity and resources across multiple clusters. If you run out of capacity in one region in a cloud environment, you can actually spill up or place your resources in a different region. The third one is something I did a talk about at the CUBE AI co-located events called the Gateway API Inference Extension, which is an extension of the Gateway API specifically designed for inference workloads.
Bart: Bart Farrell asked Abdel Sghiouar (works for Google Cloud): "Do you agree with John McBride's assessment that Kubernetes is a platform of the future for AI and ML, particularly for scaling GPU compute? What challenges do you see in running AI workloads on Kubernetes?"
Abdel: Do I agree with that assessment? Yes. I'm going to borrow a statement from Clayton Coleman: LLMs are the new web app. The biggest challenge now is availability—specifically, finding and obtaining GPUs when you need them.
One of the promises of Kubernetes is scalability on demand, being able to access resources only when needed. This worked great when resources were only CPU and memory, but with GPUs, which are hard to get, you face a challenge: Do you pre-provision them and pay even when not using them, or auto-scale and risk unavailability?
Availability is one of the biggest challenges people will face, especially ensuring it makes sense from a cost perspective. The second challenge is utilization—making sure you maximize resource usage to get the most out of them. The third challenge is monitoring and observability of GPUs and TPUs.
There are emerging technologies like device plugins from the device management workgroup. As a community, we're working through the challenge of being able to observe and ensure workloads perform and behave predictably.
Bart: John McBride criticized Kubernetes for its slow pace in reacting to emerging technologies, particularly GPU workloads. What areas do you think Kubernetes needs to improve to better support modern workloads?
Key areas for potential improvement could include:
Device plugin frameworks
Device management capabilities
Support for specialized hardware like TPUs
Long-term support (LTS) for emerging technologies
Abdel: I wouldn't agree that it was slow. I think the LLM thing happened too fast. As with any technology, people need time to adapt, adjust, learn, and figure out what these things are. I would say the community is doing a great job, both on the open source and commercial sides, where people are building things and contributing them back to the open source side. I don't think the community needs to do anything in particular to make things better. The community is really reacting. Compared to any other open source projects, Kubernetes in its short 10-year lifetime has been able to adapt very fast. I think the community is doing a great job.
Bart: John McBride challenged upstream companies to dedicate more resources to making Kubernetes the platform of the future for AI/ML workflows. What specific improvements would you like to see in Kubernetes to better support AI workflows?
Abdel: I think we need more collaboration from big cloud providers. Currently, it's uneven in terms of contributions, and as someone from Google—one of the biggest contributors—I believe we need more contributions from other companies.
The reason is that in this day and age, we need more people to converge towards standardization instead of building various specs. We need more companies to put resources into improving specs, advancing APIs, and maintaining them better.
I've talked to many Kubernetes maintainers about challenges around infrastructure management, backward compatibility, and long-term support (LTS). We need more companies to invest resources to make things better for everyone. I would hate to see Kubernetes diverge because people don't want to build together. No one wants to lose this community, which would be unfortunate. We need more companies to engage with the community.
Bart: What's next for you?
Abdel: For KubeCon, I'm almost done. I have a community hub talk tomorrow. We're going to be doing some live streams on the Kubernetes podcast today and tomorrow. Basically, more content is coming out, specifically more Kubernetes podcast content, more videos on YouTube, more articles hopefully on my blog, and more KubeCons.
Bart: If people want to get in touch with Abdel Sghiouar, the best way to do so would depend on the context. Since he works for Google, he might be reachable through professional networks like LinkedIn or the Kubernetes Slack community platform.
Abdel: Just reach out to me on LinkedIn, Twitter, Bluesky, Kubernetes Slack, or the podcast.