Kubernetes Change Management Without Outages

Jun 19, 2026

Guest:

Duncan Doyle

Making changes to Kubernetes settings in production can be stressful, especially when you already have requests, limits, autoscaling, and probes set up.

Duncan Doyle shares why it’s important to manage changes in a controlled, declarative way, and explains how AI can support platform teams without making risky decisions by itself.

In this interview:

Why it’s important to have staging, rollback options, and monitoring in place before making changes in production
How using GitOps, CI/CD, and configuration as code can help lower operational risks
Where AI can be helpful in Kubernetes right now, especially for checking and analyzing your setup

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via

Relevant links

Transcription

Bart Farrell: First things first, who are you, what's your role and where do you work?

Duncan Doyle: Hi, my name is Duncan. I am based out of the Netherlands. I am the Director of Product at Solo.io, which is a company that builds gateways, service meshes, basically anything around Kubernetes native application networking and agentic networking.

Bart Farrell: A Kubernetes setting can look wrong but still feel risky to change once it's already in production. Requests, limits, autoscaling settings, and probes. What would you tell a team that sees the problem but is nervous the fix could cause an outage.

Duncan Doyle: Well, first and foremost, the most obvious one is don't start changing things directly in production unless it's really necessary. So staging environments for us is really important. So what we do with our product, that has been sort of the premise of how we operate from day one, is that we do everything declaratively. So having things in a version control system, doing automatic deployments, controlled deployments, repeatable, reliable is one thing. If you can make a change, please make sure that you can revert that change and that you have proper monitoring in place. So you can look at the setting that is wrong, make sure that you commit that change, deploy it to your QE environment, promote it to production, but then keep monitoring your platform while it's running and have the capability to actually revert. So that is what we've seen mostly in production environments. Do things in a controlled manner rather than doing a kubectl apply. I've seen people YOLO their way. You can apply in production, but that's not the way to go forward. Having a proper GitOps CI-CD pipeline configuration as code is for me the way to go and something that allows you to stay in control.

Bart Farrell: Duncan, you mentioned people YOLOing their way through things. Nowadays, we're hearing at this event too at KCD Helsinki about where AI can be used on Kubernetes and perhaps where it can't. Some people are just trying to vibe code their way through everything. Where do you think it's appropriate to be using AI in a Kubernetes context, and where it's still probably not a great idea.

Duncan Doyle: I think it's not so much. Having the AI make changes without being controlled, so without a human in the loop, is where I think it becomes really dangerous. Where I've seen it prove to be useful is to do analysis of environments. It's very good at inspecting code, it's very good at inspecting configuration and correlating events to that specific code. So doing analysis is where it really clearly shines. Making changes in an uncontrolled, so without a human in the loop is where it gets dangerous. Especially with the AI models having a tendency to please the consumer, coming up with things and hallucinating things that are actually wrong, I think is still a place where having agents doing things autonomously is very dangerous. Also because of the fact that not only that they can make a change that can cause catastrophic outages or whatever but then also the question if that happens and an agent performs that operation who's actually responsible and accountable for that operation when that leads to a multi-thousand dollar euro loss in production so I think those are challenges that need to be answered. Ultimately, I think a goal for current day developers and platform operators is to learn how to work with AI. That's what I've seen. In the last couple of months using it for a lot of coding and analysis myself. How do I get the AI and instruct the AI in such a way that it will do its job the way I want it to do? One of the things that I found interesting is having AIs check on AIs. Use Claude to create a design for something and have Codex verify it and back and forth. And I found that I got really good results with that kind of those kind of approaches.

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via