Kubernetes for AI Workloads

Kubernetes for AI Workloads

Apr 1, 2026

Guest:

  • Olawale Olaleye

Many teams can build an agentic AI proof of concept quickly, but in production, scaling, governance, monitoring, and cost control become real problems.

In this interview, Olawale Olaleye explains how managed Kubernetes, Amazon EKS Auto Mode, and MCP servers can significantly reduce operational burden, allowing teams to focus on their workloads.

In this interview:

  • What changes when agentic AI moves from POC to production

  • Why governance, security, and monitoring need to be automatic

  • How managed Kubernetes and EKS Auto Mode simplify day two operations

  • Where the Kubernetes ecosystem is headed for AI workloads

The future of AI on Kubernetes depends on making infrastructure invisible enough for teams to ship safely and quickly.

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via

Transcription

Bart Farrell: First things first, who are you, what's your role, and where do you work?

Olawale Olaleye: Hi, my name is Olawale Olaleye. Some of my friends call me Ola. I'm a Senior GenAI Machine Learning Specialist Solutions Architect at AWS, and I'm also a Container Subject Matter Expert. I work with customers that are trying to build AI and machine learning workloads and deploy on AWS.

Bart Farrell: What are three emerging Kubernetes tools that you're keeping an eye on?

Olawale Olaleye: I think the first one would be KRO, which is Kubernetes Resource Orchestrator. I'm keeping an eye on it because of the capability it has to help teams. Not just one team across multiple teams. You can use KRO to create reusable APIs in a Kubernetes cluster. And the fact that you can have a platform engineering team use that KRO to create APIs that development teams can consume really makes sense. The second one would be KServe. You can use KServe to distribute and serve your AI model inference inside the Kubernetes cluster. And the third one would be Karpenter. It's not emerging, but I would just like to call it out because of the active support it gets in terms of how you can use Karpenter to provision, the compute you need for your AI workloads in a Kubernetes cluster. It's receiving a lot of support and it's incredible to see how you can use it to have a vast array of configuration. And if you want to use any type of GPU or any type of neuron accelerator, you can use that with Karpenter.

Bart Farrell: Agentic AI is becoming the next big thing and running these agentic workloads on Kubernetes introduces some unique infrastructure challenges. Can you describe what are some you've seen in your experience?

Olawale Olaleye: It's a wonderful space to be in. One of the common challenges I do hear from these customers is that it's easy to actually build a POC but it becomes a huge challenge when you want to productionize. So for example, scaling, let's take scaling for example, you can easily build a single agent, but when you need to scale to hundreds of agents or you want to build a multi-agent system, then it becomes a challenge in terms of how you want to scale those workload. We don't really have so much, we don't see so much in that space yet, but the community is actively supporting that, which I'm also curious about. Then another thing is how do you continuously optimize for cost? You need your agents to run. And when it comes to scaling, you also need to think about the compute. You need to find a compute that is well optimized, not under provision and over provision. So those are the challenges I see with customers trying to go live in terms of productionized agentic workflow on a Kubernetes cluster.

Bart Farrell: When you are deploying agentic AI systems in production, what needs to happen automatically to keep everything running smoothly?

Olawale Olaleye: To keep everything running smoothly when you're deploying agentic AI workloads, the first one would be governance and security. It has to be automatic. You have to have an environment where you're able to audit what the agents are doing. You also need monitoring in place. You need to be able to see what these agents could do. What assets do they have in the cluster. Because you're also running them as containers, you need to also think about security, what the container is capable of running. So you need to put in mind how you implement container runtime monitoring. So another thing is auto-scaling, which I talked about earlier on. It has to be automatic in the cluster. You don't want a situation whereby your agent workload needs to scale and you have to start thinking about the infrastructure that you need for that scalability. So those are the things I really feel out of my head right now that you need to have automatic in the cluster.

Bart Farrell: For teams evaluating how to run their agentic AI workloads on Kubernetes, what's the easiest path to get these optimizations without becoming infrastructure experts?

Olawale Olaleye: Great question. Without becoming infrastructure experts. Back in the day, you would say that it's completely impossible. But now you can just think of managed Kubernetes. Don't think of trying to implement a Kubernetes cluster yourself. So there are managed Kubernetes clusters that you can leverage, that you can use, for example, Amazon EKS. And not just a managed Kubernetes cluster that manage the control plane. You also need a managed Kubernetes cluster that also takes care of the data plane itself. And this is where Amazon EKS Auto Mode comes in. With Amazon EKS Auto Mode, you have a managed cluster with the essential cluster capabilities. You need to run a production workload, with just a single click, you can have a production ready cluster. It comes with the essential managed capabilities such as pod networking, service discovery. You also have ability to extend the functionality of those clusters. Add-ons and node patching, those become like a thing of the past. You get that automatic in Amazon EKS Auto Mode. Now, in terms of how you maintain day two operations, you also can use MCP servers. You can start to think about how you transform, how you do infrastructure management in the past by introducing an agentic AI approach. So you can have agents deploy workload for you, troubleshoot workload for you, scale the workload for you, and also continuously monitor and give you relevant information when you need to make an improvement to those workloads.

Bart Farrell: Kubernetes turned 10 in 2024. What do we expect in the next 10 years?

Olawale Olaleye: In the next 10 years, I actually expect the continuous growth in the community. I expect the community to continuously support AI workflows, agentic workloads. That's where the future is headed to us. So I trust the community. I'm very enthusiastic about the community. So you would see a lot of support. There will be new tools, new framework. We need framework, particularly on how to run agents and in a safe way. So I expect to see more of that in the community.

Bart Farrell: What's next for you?

Olawale Olaleye: What's next for me? So I think I would be spending most of my time now trying to create a managed experience for customers that are running AI workloads and also agentic workload. So if you remember earlier when I talked about using managed services, Customers really don't need to spend so much of their time working on or trying to set up infrastructure, the underlying infrastructure to run those AI workloads. So you can just rather spend your significant time on building your workload itself rather than trying to scale infrastructure or configure infrastructure. So that's what I'll be doing in the next aspect of my life.

Bart Farrell: How can people get in touch with you?

Olawale Olaleye: You can hit me up on LinkedIn. My name is Olawale Olaleye. That's how you can find my LinkedIn.

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via