Why AI Code Review Fails in Kubernetes Platform Teams

Why AI Code Review Fails in Kubernetes Platform Teams

May 22, 2026

Guest:

  • Pronomita Dey

Teams often treat application code as the only real code, while Helm, YAML, and platform configuration get less scrutiny.

Pronomita Dey explains why that blind spot gets worse as AI starts generating more infrastructure changes, more recommendations, and more code than teams can realistically review.

In this interview:

  • Why Helm, YAML, and template-driven infrastructure often escape the same review as application code

  • How AI-generated changes can increase trust too quickly, from patched dependencies to oversized implementations

  • What stronger guardrails, policy as code, and runtime awareness look like in modern platform teams

  • Why the next era of platform engineering needs more traceability, blocking checks, and internal platform maturity

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via

Transcription

Bart Farrell: So first things first, who are you, what's your role, and where do you work?

Pronomita Dey: Hello everyone, and thanks a lot for having me. I am Pronomita. I work as a senior software engineer with Intuit, and I am with part of the larger platform team, and our main job is to make sure the developer experiences are as seamless as possible.

Bart Farrell: Pronomita, We want to dive into the subject of code quality, where code quality breaks down, so just want to ask You about the following questions. Where does code quality break down first when changes span application code, Kubernetes configs, and CI pipelines in the same PR?

Pronomita Dey: Just picking a few words from what you just mentioned. You mentioned Kubernetes configuration. There is application code and there is configuration in general. And there is CI/CD. So as we platformize things, everything is being done to make sure your developers productivity is high, you're making things fast and ready to ship faster. We have templatized everything. We are going with the notion of everything as code. But what we are essentially still in the mindset of application code is real code. And the rest of it is code, but it is more of configuration. So what we essentially deal with is application logic. And there is some configuration and that too configuration is not. As it is directly what we are seeing, it is rather templates which are getting rendered at runtime. So, you know, the template is more or less set. There are certain set of values that will be patched together. And you do not anticipate so much of it to go wrong. And you focus more on the application code side of things. So then there is a change going in, which has a change in the values.yaml file. It has some change in your maybe GitHub Actions file, as well as a change spanning across your application code base, the focus tends to go more towards the application code. And additionally, when we are looking at anything as code, apart from the application code, the layer and the number of steps with which we scrutinize the code is always focused highly on the application side. So I would say that gets filtered better, vetted better. So the breaks always happen whenever we are hitting the YAML section, the Helm section of our change. In my experience, that is where we tend to miss out things.

Bart Farrell: You mentioned Helm. Why do YAML and Helm changes consistently escape the same scrutiny as application code? And what does that cost teams in production?

Pronomita Dey: So I will pick up from where I left off last. When we say Helm or when we say any YAML, we are again looking at a template. We are not essentially looking at a configuration code to be specific. It is rendered at runtime. Now, if we just look at both items separately, your application code is going through linting checks. It is going through unit tests. It is going through static scans. We are not even doing any part of that with our Helm configurations. We might be doing logical checks. We might be doing semantic checks. There is some policy as code enforced, which will go and check your resource limits as well as do some checks on your endpoints being correctly set or some permission checks done through the values you're injecting. But that is the extent of it. Your policy will not be able to anticipate a higher memory that might be required for your service to spawn. For example, an extra heap memory that will be used. That does not get incorporated because again, that is what happens in that particular environment. You might be facing that in production, but your low environments hardly run into those issues. So even if you keep those checks in mind, and even if you keep another person there who is doing a code review and maybe multiple people doing the code review, what They are essentially looking at is a template which will be rendered later. So the checks happen majorly on the logical side and not on the runtime implications they have. So I think that would be the biggest reason.

Bart Farrell: As AI writes more of the code going into Kubernetes deployments, what does good governance of that code actually look like?

Pronomita Dey: AI is writing code and AI is writing a lot of code. In my experience, the biggest challenge has been, even if I am writing a very small automation, if I would have personally just gone through the code base and made a small change that would have been as simple as injecting an if clause. When AI is doing that, it is by default, I would say, over-helpful in nature. Not only is it making that change, it is also bombarding my whole script, which was 100 lines to a 150 line script, just by adding a bunch of comments, adding extra exceptions. So that is one thing, it is making it bulky by default. And the second thing is, now that I know there is so much coming in, I, as a developer, or even as a platform engineer, for that matter, we are seeing it as good suggestions, and we are incorporating it. So we are moving with higher trust mindset for that matter. So that is one exact thing where I feel we need to scrutinize AI better, build more and more heavier guardrails. If not blocking guardrails, we should have guardrails that need to be mandatory. They do not need to be mandatory checks, but at least You should flag, You should give that, hey, this is a possibility. You are pushing something extra. You need to be very careful with all the suggestions that you're accepting out of AI. So in this case, adding things like pre-merge checks, adding things like your policy as code enforced into your, in a very shift left kind of setup, have those policies in place. So I personally would feel more comfortable pushing that to my cluster if it is way heavier check. So I don't think, I mean, going forward, AI is going to be your peer and you just need to have better eyes, place more traceability in what is being injected. And yes, as and when we expand, we need to be more transparent and be able to see what is happening to know what will happen.

Bart Farrell: And this kind of ties into our next question. Where are engineering teams getting AI code review right today? And where are they still flying blind?

Pronomita Dey: I feel a little conflicted about a lot of times when we are trying to balance out faster deployments. We want to be more efficient. We want to bring in productivity as a whole. And we are shipping a lot more, and a lot faster. Now, if the question is, where are we flying blind? One would be, I would say, it's becoming a little inevitable in the process of running, I mean, deploying fast or achieving a goal faster is blindly taking a lot of recommendations. For example, my security team raises a ticket, and you need to get a bunch of these libraries patched. You need to get this Plunk running on the instance; everything should be patched. Use a different AMI. So instead of picking a version that suits me best, we tend to ask for these recommendations, and we tend to incorporate those recommendations. And it is humanly quite difficult to go through all the versions that have been suggested to replace in your requirements.txt. So we tend to go with AI, which tends to give you the latest ones. You tend to just patch them in and ship it. Your security bug and your security report have been fixed. You're good to deploy. That is one thing where we tend to blindly trust and get going with it, because we also assume the latest is supposed to be the most bug-free, but it might not be the best for your system. Now, that said, another thing which I previously mentioned, and again, it feels a little heavy, is the volume of code that AI generates. For that same situation, if you were writing a simple 403 call, it would have done it for you. AI will make sure it generates 10 exceptions and catches everything. So as a reviewer, it becomes very overwhelming, I would say. The moment you see 100 lines getting converted to 150, your brain tends to slag a little, and you tend to miss out on a lot of things. I would say that is one thing where we just end up reviewing only the ones that are most relevant. And okay, these are extra comments that have been added. These are some extra exceptions that have been added. That sort of creates a problem. And as a result of this, I think another problem that comes into the platform is that as our platforms mature, we tend to templatize everything. We tell our developers, you know, just pick up, push in your logic, and deploy it. So they tend to lose that touch with the metal of the system. It is, again, a standard template that is generated for them. What turns into a debugging nightmare, I would say, is when you run into issues. So personally, these days, that is something I feel is a little scary, that I have AI generating a lot of my code that is already running. Earlier, I would know where to go and debug what, because I did a lot of the thinking on that point to add a particular thing, add an exception, or catch for a particular scenario we might be running into. But in this case, that time to go and grab that exact point where it is failing, because it lacks my organic thought process there. So I feel that is something that increases a lot of the turnaround time. And yes, that is a big black box to solve in itself.

Bart Farrell: Speaking about black boxes, things that sound scary, and elements of trust. What would it take for you to trust an AI recommendation on a production-bound infrastructure change? What's the bar? What's the standard for you?

Pronomita Dey: It's just one word that needs to be extrapolated: the amount of guardrails we have provided for that. How well have we tested it? How well have we incorporated our previous failures into building this particular change? The more guardrails we have, the more policies we have. It's, again, a learning system, and it takes a certain level of maturity to reach there. So I think with time, it will become more trustworthy.

Bart Farrell: Now, Kubernetes is over a decade old and still accelerating, although there's some debate about that. But I think we can say things still move at a very fast pace. What does the next era look like for teams that are trying to maintain code quality at that scale?

Pronomita Dey: Well, I think the shift that has happened over the years, with Kubernetes coming in, and what has run in parallel to that, is how we have platformized everything. IDP is the standard now. Nobody expects a different team to operate and support your application infrastructure. So with that coming into place, and with the whole mindset shift of having templatized everything, we will see more and more automation required and people pushing for more of it. We will see a lot more self-remediating and self-debugging systems coming into place. So as the supporting platform, what that means is you will need to be more and more vigilant. We need to have stronger, better, and wider eyes. As that volume keeps increasing, we will need to trace better. We need to put better guardrails in place. And with time, as we move into more AI-generated, infrastructure-managed systems, we will need to shift to mandatory blocking guardrails for everything rather than ship something we're in doubt of.

Bart Farrell: And Pronomita, what are you focused on building or solving next?

Pronomita Dey: Well, on my work side of things, I am extremely focused on making sure our platform is finding more and more assistable places I can work on. A little away from that, right now I'm focusing more on understanding the whole sustainability side of how CNCF projects are working out. I've been exploring the whole GreenOps aspect of it, and I'm trying to read a lot about it and pen down some of it. Apart from these, I am an active part of the Women in Cloud Native community, and we run webinars and meetups. So I'm meeting a lot of people and learning a lot of new things.

Bart Farrell: And if people want to follow your work or get in touch with you, what's the best way to do that?

Pronomita Dey: Well, I think LinkedIn is my go-to. LinkedIn is kind of linked everywhere, so you can message me there.

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via