ToolHive and MCP on Kubernetes

Jul 3, 2026

Guest:

Juan Antonio Osorio

Kubernetes settings might seem off, but changing them in production can feel risky.

Juan Antonio Osorio, Principal Engineer at Stacklok, shares why teams should have staging, CI, and observability in place before issues reach users.

In this interview:

Ways to handle risky Kubernetes configuration changes
How to spot readiness and autoscaling issues early
Why engineers should start thinking like product owners in the AI era
How AI workloads are changing Kubernetes infrastructure patterns

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via

Relevant links

Transcription

Bart Farrell: So first things first, who are you, what's your role, and where do you work?

Juan Antonio Osorio: My name is Juan Antonio Osorio. People call me Ozz. I'm a principal engineer at a company called Stacklok. We do AI infrastructure in the Kubernetes world.

Bart Farrell: A Kubernetes setting can look wrong but still feel risky to change once it's already in production. Requests, limits, autoscaling, or probes. What would you tell a team that sees the problem but is nervous the fix could cause an outage?

Juan Antonio Osorio: Not fixing it is going to cause an outage as well. People underestimate having a staging environment. Definitely try to put some effort into that and be less afraid because we've done this before. It's something that you can tackle. Don't be afraid. Just try to set up guardrails so you can catch it fast.

Bart Farrell: Missing readiness checks usually show up through something concrete. Traffic reaches a pod too early, auto-scaling behaves strangely, or users report errors. If a team wanted to catch this before users do, where would you have them look first?

Juan Antonio Osorio: It sounds harsh to say, but I would say CI. A lot of these things you would normally catch either in a staging environment or in CI. Definitely shift left as much as you can, invest in early testing. But once you do, OTel is your friend as well. Hopefully you have that set up. If not, the community is really friendly. Reach out to them too.

Bart Farrell: Yesterday, we had a pretty in-depth conversation about engineers needing to be more focused on product, given all the AI that's coming in. Could you speak about that?

Juan Antonio Osorio: I was talking yesterday about the shift that is happening with AI making coding easier and the shift in roles that engineers are going to have in the future. My strong belief is that you're no longer just a software engineer. You're now a product engineer. You should talk a lot more with the product folks. You should talk a lot more with your users. You should really care about the end-to-end thing. There's no more handoffs. There's no more bottlenecks. You're end-to-end owning this thing. Care for your software is what I'm trying to say.

Bart Farrell: You said you're working on AI infrastructure related to Kubernetes. For a lot of people that might sound like science fiction, even though Kubernetes is a mature technology, seems like there's still some work to be done. Tell me more about that.

Juan Antonio Osorio: AI workloads and all of the adjacent technologies are different, even though you can run them in Kubernetes in theory. The access patterns and traffic that you get are slightly different too. There are some pieces that require a little bit of extra work. For example, you're going to have a central gateway to intercept all of the traffic going to the LLM. That is not REST. That is a full-blown stream that you need to care for. Autoscaling, dealing with bursts is going to look a lot different. That's some of the work that we're doing. On the other hand, for instance, giving access to tools for your AI agents is also going to look different, but familiar. We have the concept of an MCP server, which serves as a standard protocol for AIs to use tools. However, the protocol itself is not ready to auto-scale either. It's also a stateful protocol today. Making that work requires custom software, which is what we do in ToolHive, for example. It's an open source project, Apache licensed, and check it out: it runs in Kubernetes. Those are some small examples of how even though AI feels like it should be the same, it really quite isn't. But at the same time, some of it is the same old. Being able to package these things or package agent skills. We can do it through OCI. We've already solved that problem in the Kubernetes space and the cloud native space. My job here is to be that boring infra person who reminds people, hey, we've solved this. Let's use this good old boring technology.

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via