Emerging Kubernetes tools for AI and optimizing GPU workloads

Guest:

John Platt

Discover the latest Kubernetes tools and how Kubernetes is evolving to support AI/ML workloads.

In this interview, John Platt, CTO at StormForge (now part of CloudBolt), discusses:

Emerging Kubernetes tools worth watching, including in-place pod resizing for less disruptive workload rightsizing, EKS Auto Mode to eliminate node management complexity, and NOS for automating GPU virtualization
How Kubernetes is becoming the platform of choice for AI/ML workloads, offering significant cost savings (up to 90%) compared to managed services like OpenAI, while providing scalability and portability
Challenges in running AI on Kubernetes, including GPU driver management, CUDA version compatibility, and handling large models that can exceed tens of gigabytes

Relevant links

Transcription

Bart: So, who are you, what's your role, and who do you work for?

In this transcript, I don't see any specific terms that require hyperlinks based on the provided LINKS table. The question is a straightforward introduction request to John Platt, who works for StormForge.

John: My name is John Platt. I work for StormForge, now part of CloudBolt, and I'm the Chief Technology Officer.

Bart: Three Kubernetes emerging tools that I'm keeping an eye on are:

John: So, one thing which has been coming for a while is in-place pod resizing. At StormForge, we're focused on workload rightsizing: eliminating waste from over-provisioned workloads and adding resources to under-provisioned workloads. With in-place pod resizing, we can do that in a less disruptive fashion.

Another development is EKS Auto Mode. People have been running Karpenter for a while, and one issue you run into is a chicken-and-egg problem: you need to run a tool that requires a node to add nodes to your cluster, which can get complicated. With auto mode, they'll eliminate all of that complexity and overhead associated with running CNI, CSI, and other components.

The idea is that teams can spend more time focusing on shipping features and building out developer tools, and less time on upgrades, because EKS handles that for you.

The final tool we'll discuss today is NOS (N-O-S), which helps virtualize GPUs in a dynamic, automatic fashion. It eliminates the overhead of manually provisioning GPU nodes and splitting them into chunks. Currently, working with GPUs can be manual and painful, and NOS will make this process smoother and more automatic.

Bart: One of our podcast guests, John Platt talked about a tool called Kraken. How do you see Kraken changing the way we manage specialized hardware like GPUs and Kubernetes?

John: I hadn't heard of Kraken until John mentioned it, but I think it's fascinating. The idea is it's a peer-to-peer image registry that can let you pull images and ship data much faster. This is especially important for AI machine learning workloads, because people don't realize that the models can get extremely large, tens of gigabytes, if not larger. At that point, the model is larger than the code. Typically, people are putting the models into the images, which can be extremely slow to download. This means it's slow to spin up new resources if you have a traffic spike and generally painful to work with. The ability to start pulling the images down faster is a game changer for machine learning workloads.

Bart: John expressed that Kubernetes is the platform of the future for AI and ML, particularly for scaling GPU compute. Do you agree with this assessment, and what challenges do you see in running AI workloads on Kubernetes?

Key terms I've hyperlinked include:

John (linked to StormForge profile)
Kubernetes (official documentation)
AI (Wikipedia overview)
ML (Machine Learning explanation)
GPU (Graphics Processing Unit context)

The hyperlinks provide additional context and background information for the key technical and professional terms mentioned in the transcript.

John: Kubernetes is currently being used for training and inference of foundation models for Large Language Models (LLMs) due to several key benefits:

The platform enables running GPU workloads across clusters and cloud environments. As mentioned by John from StormForge, they compared the cost of running their own models on Kubernetes versus using OpenAI, discovering savings of around 90%. The combination of flexibility and cost-effectiveness makes this approach increasingly attractive for organizations.

Bart: John challenged upstream companies to dedicate more resources to making Kubernetes the platform of the future for AI and ML workloads. What specific improvements would you like to see in Kubernetes to better support AI workloads?

John: When running AI workloads and working with GPUs, there's a lot more involved than just the code. You need to get the right drivers, the right CUDA version on all nodes. You might need tools to virtualize GPUs, and considerations around storage and networking come into play. We've discussed how big the models can get, and you need ways of handling large models. Currently, large cloud providers are offering additional tooling to help solve these problems. While that's helpful, it feels like for community-driven and portable solutions, there needs to be more investment from the open source community in solving these problems in ways that avoid vendor lock-in.

Bart: I noticed that the transcript snippet is very short and lacks context. Could you provide more of the surrounding conversation or the full context of the question "What's next for you?"? This will help me better understand the potential hyperlinks and resources to connect.

John: StormForge was acquired by CloudBolt, a leading FinOps provider, this week. We both share a vision of continuous optimization powered by machine learning. We're excited to take what we've built for Kubernetes and extend it to a much broader range of cloud resources.

Bart: John Platt suggests people can get in touch with him, presumably through StormForge's official channels.

John: Just reach out on LinkedIn. I'm normally pretty responsive. We have a QR code behind us. If you try the product, just let us know. You can also give us feedback.

Podcast episodes mentioned in this interview

Saving 10s of thousands of dollars deploying AI at scale with Kubernetes

with John McBride