How Platform Teams Govern AI-Generated Kubernetes Config

How Platform Teams Govern AI-Generated Kubernetes Config

May 26, 2026

Guest:

  • Artem Lajko

If your Kubernetes standards fall apart the moment a pull request touches app code, config, and CI at once, the issue usually is not YAML itself. The real gap is ownership, context, and traceability across the delivery chain.

Artem Lajko explains why config changes often receive weaker review than application code, where AI-generated infrastructure helps but still fails fast without enough context.

In this interview:

  • Why Helm, overlays, and shared PR ownership make config quality harder to maintain

  • What good AI guardrails look like for manifests, policies, and production-bound changes

  • Why platform teams need better traceability across OCI registries, SBOMs, and compliance requirements like the Cyber Resilience Act

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via

Transcription

Bart Farrell: Who are you? What's your role and where do you work?

Artem Lajko: Hello, my name is Artem Lajko. I'm working as Head of Platform Engineer for IITs Consulting, a startup coming from Germany.

Bart Farrell: Where does code quality break down first when changes span application code, Kubernetes configs, and CI pipelines in the same PR?

Artem Lajko: In the same PR, from my experience, it breaks in the config part. If the pipeline changes something in the configs, it breaks because the application code comes from the developer. So they are the owner of the code, and the config parts and the pipelines come from the platform team. If you put everything into one PR, you don't have end-to-end ownership. In most cases, if something goes wrong, it breaks in the config, when the CI part tries to change something in the same pull request. This is what I see because of the missing end-to-end ownership from application code to the infra part.

Bart Farrell: Why do YAML and Helm changes consistently escape the same scrutiny as application code? What does that cost teams in production?

Artem Lajko: From my point of view, we treat code as code over the decades. So we have a lot of tests and we know how it works. But if we are taking a look at config, especially YAML files, even if we say everything is code, we don't treat it similarly. In most cases, it's really readable. This happens when you're not just using YAML manifests, but doing something like Helm umbrella charts. You are losing a lot of control and you are not testing very well. And we also don't validate them compared to the code where we have unit tests, we have a lot of frameworks. This is what I've seen most often. It becomes very difficult to understand when some config breaks, because you don't test it enough. If you have overlays, or if something is happening in the CI, you're losing traceability compared to the code where you don't have a lot of overlays. And this is where it breaks compared to application code.

Bart Farrell: As AI writes more of the code going into Kubernetes deployments, what does good governance of that code actually look like?

Artem Lajko: We need guard rails again. It's a little bit funny because in the past we wanted to provide golden paths to the developer and now we have AI as a citizen of our platform. Depending on the models you are using, you can just say deploy me or generate me a deployment manifest with best practices. If you have an old model and you're missing some security context, such as setting privileged to false, no privileges allowed, or read-only settings. If you have a new model, you get better quality. So from my point of view, as we experience it, you need guard rails so you can say everything AI-generated is validated. You create a baseline and say if it fits the baseline, it will work for you. This is how we experience it because it's not deterministic. It depends on the model you are using. Here I would say we need guardrails and governance to understand what is AI-generated and how it fits into the infrastructure before you even go to production.

Bart Farrell: Where are engineering teams getting AI code review right today? And where are they still flying blind?

Artem Lajko: I can just speak from the point of view of a platform engineer. If I have plain data, I look at my manifest and I can see everything. So it's very easy to understand, not just for me, but also for the AI, what's working well, what's not. If I'm using it for some simple stuff like generating policies or templates and so on, I call it low context. It works really well. But if I have something really complex or a lot of cascading, like with the Helm chart, and I need to understand the domain, I call it rich context. In most cases the AI fails really fast. For simple stuff like creating security context for this part, it generates overlays that don't match your Helm charts, the umbrella Helm charts, or your infrastructure because they don't get the context. Then it's easier for a human to write it themselves compared to AI because you have this rich context compared to the low context. At this point I see the difference, especially working with configuration as data.

Bart Farrell: What would it take for you to trust an AI recommendation on a production-bound infrastructure change? What's the bar?

Artem Lajko: The bar is really high. To answer the question directly, to trust AI, especially to deploy something in production for critical infrastructure, If you're responsible for a platform. So it's becoming very difficult to trust AI. If you're doing documentation stuff, or code or comments, it's becoming very easy. It works if we know it works. So in our team, let's establish rules between AI and human-generated part. So we can say if it's critical code or something can go wrong, we need to review it. If it's just some comments in this code, it's okay. So now we also updated our pull request templates with a note: this is fully generated with AI. I understand the logic, but I don't get all the libraries that are called. And then you can mark it in the pull request. So we understand if it's touching critical code and the coder or the platform engineer doesn't get it. So we don't trust it. And we need to review it. If a coder says, I understand the code, I understand the library. I get it. AI just helped me. Our goal is to make it traceable what's coming from the AI part, what's not. And based on this context, we try to think about it to decide what we trust or what we don't trust. It's not easy to explain but just to blindly trust the AI to push to production. I'd say not yet, maybe in the future. So this is how we handle it. But this is also a challenge to solve.

Bart Farrell: Kubernetes is over a decade old and still accelerating. What does the next era look like for teams trying to maintain code quality at that scale?

Artem Lajko: To maintain code quality at scale means, for me, to get back control over the sprawl of config. If I'm speaking at scale, I'm speaking of managing 1000 clusters or 2000 clusters, a lot of applications. For example, if you have a lot of overlays, like I already mentioned, with the same umbrella charts or customized, and it doesn't depend on the tool. In the most cases, you have a base and some overlays, because you don't want to repeat yourself. This is what we have now. I think a lot of platform teams at scale for quality, need to get control back over this config sprawl which is the project.

Bart Farrell: What are you working on next? What are you going to be building next?

Artem Lajko: It's not a secret anymore. Last Monday, we open sourced a framework called Kubara. It allows you to create your own distros. We also use this framework to build a general distro, that came from internal work with some customers and made it open source, not just a framework, but also the general distro. Now we try to solve this at scale, and get visibility for the future because the Cyber Resilience Act is coming in 2026, 2027. In short, you need to understand your chain. We try to extend this framework so you don't just hydrate or generate your manifest. But you can also sign them so you can push them to OCI registry, for example, and you can understand the whole chain, like images, and SBOMs and so on. So in short, we try to prepare platforms at scale to understand the whole graph and understand where the resources are coming from for the next reports you need to provide in 2027. If you need to understand your chain, this is what we try now to solve.

Bart Farrell: How can people get in touch with you?

Artem Lajko: The easiest way is to follow me on LinkedIn and drop me a direct message. I'm also publishing a lot of blogs on Medium. When I publish them, I will create a LinkedIn post. So the easiest way is just follow me on LinkedIn.

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via