Why AI Struggles With YAML, Helm, And Deployments

Why AI Struggles With YAML, Helm, And Deployments

Jun 11, 2026

Guest:

  • Shivay Lamba

Are your Kubernetes changes crossing application code, manifests, Helm charts, and CI/CD at the same time? That is where code review gets harder, AI-generated changes get riskier, and teams realize too late that production safety was mostly assumed.

Shivay Lamba explains why infrastructure changes require a different review mindset, why AI code review still has clear blind spots, and why production-like validation is essential before trusting generated changes.

In this interview:

  • Why mixed pull requests create review gaps between application code, YAML, and deployment logic

  • What good governance looks like when AI can modify Kubernetes-related code and configuration

  • Where AI review already helps today, and where security, scale, and infrastructure context still break down

  • Why preview environments and stronger guardrails matter before production-bound changes go live

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via

Relevant links

Transcription

Bart Farrell: first things first, who are you? What's your role and where do you work?

Shivay Lamba: I'm Shivay. I'm a CNCF ambassador and I'm also a developer advocate or a machine learning engineer at Qualcomm.

Bart Farrell: We want to dive into these questions to get your feedback to hear what's going on for you. Where does code quality break down first when changes span application code, Kubernetes configs, and CI pipelines in the same pull request?

Shivay Lamba: when you're looking at an entire end-to-end lifecycle for software, all the way from application code when you're making changes to it, and going all the way to like, let's say, basic CI level code. So of course, the code quality really matters throughout that entire stack. So whether it is the application code, or it is the Kubernetes config that you might have, right? Now, of course, if you kind of consider how things are moving. So first, you have the application code that gets built, and then you are probably running some tests alongside that and then finally you'll be doing a CI/CD pipeline run where you'll first do the CI and then finally you'll deploy it and usually Kubernetes you know would come kind of in that final layer right so honestly like my particular point is that in case the application code itself has some flaws then you won't even reach to the Kubernetes stage if you have a good CI/CD pipeline because you know the test case might actually like even fail even before you start to actually deploy the application so I feel you know first it would start with application code and eventually like depending on what type of issue is there and you know how you have set up your CI/CD pipeline in terms of code quality checks or maybe in terms of you know the bugs that you might come across during the testing phase that might actually break.

Bart Farrell: Why do YAML and Helm changes consistently escape the same scrutiny as application code? And what does that cost teams in production?

Shivay Lamba: if you look at what an application code is, let's say it's a Python file with a specific function and function names and the implementation of the code. So generally, our thought process is that the way that we'll design a test suite would be a unit test, integration test, and unit test really breaks down and tries to scrutinize. what a particular application code function is supposed to do. So in case that particular function is not behaving or the output is not matching the format that we are supposed to do, it will likely just stop the CI pipeline, you know, when you are going ahead and uploading your code. So you have pushed the code to GitHub and it automatically runs our CI pipeline. So during that time, it's a lot easier for us to detect the application code flaws in case we have a good test suite whereas when it looks like YAML or you look at Helm they don't get the same level of scrutiny or at least what i've seen as general practices you don't have like an incredibly high level test suite that you might have with application code so you know when you are running your entire code through a CI/CD pipeline it's a lot easy for us to probably like slip in a wrong or a bad YAML or a Helm because normally our thought process would be that if you are having like let's say Helm or YAML that might only be run let's say the first time we are setting up our infra but maybe it might not be being run each and every time so you know ideally what should happen is that if there are any changes that are detected in the YAML config or in the Helm charts that we have in our application. Should I rebuild the entire config automatically? But a lot of times, we are probably not doing that when running an end-to-end CI-CD pipeline for every single code thing.

Bart Farrell: as AI writes more of the code going into Kubernetes deployments, what does good governance of that code actually look like?

Shivay Lamba: I think of this in two different ways. One is the governance around permissions. And when I say permissions, what are we actually giving to the AI right and are we having a good level check in terms of what was the changes that were made by the AI so number one that you have to ask yourself do you want to give the right to AI to be able to actually make some of the changes to some of the most highly critical parts of your entire application delivery stack which basically also includes your YAML or Helm charts for that will be used typically for your infrastructure and for deployment so if yes if you're given that then the next level would be do you have a good code quality check-in process where at least maybe you have like a self-healing loop right because sometimes like as humans you might skip but can you use telemetry data that comes and you want to self-evaluate because there are a lot of evals that you can always run and ensure that the quality of the output from the AI closely meets so do you have an eval criteria that you could run if you're allowing specifically for AI to be able to make changes to the code related to K8s deployment and then you know there's also the permission of okay like are you giving it access to just the software I mean like let's say the production environment or maybe just the staging environment or just the dev environment so those are the kind of questions I feel would come in necessary when it comes to governance so a good governance would essentially look like that you have all of this permission set you have a good eval framework set for the code changes that are happening and at the same time like you have a good alerting mechanism in case things go wrong

Bart Farrell: where are engineering teams getting AI code review right today and where are they still flying blind

Shivay Lamba: AI code review is really working well today is the fact that the AI is not just looking at the actual code files it is using something which is very similar to how cloud code or maybe open code for that matter with the open source LLMs that you end up using really looks at your entire code so you know it will essentially like think of it like a tree structure where it starts with the root at your repository and then you will go and navigate into the different files so you get a really good understanding of what the code actually does And then we are also like looking at what those specific code file changes in that pull request that was just created how does it closely relate with the actual implementation and nowadays a lot of the popular code review tools they also have capability to further configure your entire experience where you can add business goals you can add what are some of the best practices for code quality and the code the way that you actually write code at your firm, how that comes into picture. So you can add some of these documentation items. So rather than doing a very random review or a very high level review, the code review would actually be aware of the coding styles, of the style guide guidance for front end, for back end, right? All of those can be actually given to the AI to be able to review the code. And I feel like that is something that's going on pretty well, where the AI code review the reviewer right what whatever that tool might be is gradually like becoming a part of the actual team itself because it's understanding all these things now where you know they still flying like they still fly blind i feel like there are two main concerns that i still have the number one is that as the code you know the overall size of the actual code increases how does that scale for or if we are iteratively making a lot of changes in our code such that it becomes big how quickly can the code review tools actually you know stay up with that level or you know the velocity at which the coding changes are happening or architectural changes are happening within the code base and especially if you have a lot of interdependencies amongst a lot of different repositories because you might have a simple mono repo or you might have a huge cluster of different repositories all interdependent or all interdependent on each other so that is number one and the number two is that it's still not doing a pretty good job at security so if you want to run security scans on the code that has been changed i still feel that we are lacking over there because most of the code review tools are really reviewing the code quality but we don't really have a separate, we have separate solutions in code scanning for vulnerabilities, but there's not really a single solution that can do both at the same time. So maybe, you know, the code reviewer might give you, hey, okay, like this code does not look optimized. Maybe this is a better way of implementing it. But what about the security concerns? Can a code reviewer tool also run some security scanning? the entire code base for vulnerabilities, for secrets or npm packages that might not make sense. So, you know, those are the things I feel are missing.

Bart Farrell: And what would it take you to trust an AI recommendation on a production-bound infrastructure change? What's the bar?

Shivay Lamba: what I just described was still a low-level code change, right? But if your code or if the code reviewer is actually making some changes into the production-bound infrastructure change, it made some changes in the YAML or your Helm config, that drastically changes the way the deployment would work. What I would really like for it to do is help me create a pseudo build. So what I mean by that is in some of the application systems where you have code quality being accessed by AI, there might be a provision for us to deploy like a dummy system. You might have a separate environment altogether which gets created. It's very similar to how if you're using in Vercel for a while. your CI/CD it will create like a preview environment so what I like to probably see is that if my AI could basically create a preview environment because sometimes you know there might be very small nuances in the actual changes that the AI actually makes and unless and until you have yourself tested it out and see there's one clear difference here one is you can probably glance through a small tweak that has been made into the application logic and say okay like this probably makes sense but it's very hard to predict how the nature of the changes that are happening on the YAML side of things like you could glance through it and probably make some level of mental model that okay like this might work or not but there might be some nuances that probably require for deployment of an actual environment so maybe like that preview environment would be something which I would like to keep as a minimum bar, especially if they are very high level changes and a lot of massive changes happening. Otherwise, you know, I would like to manually test it before I push it in production. So manually make sure that in a separate environment, when I deploy that YAML and, you know, and or the Helm chart, does that actually like run my entire production stack.

Bart Farrell: Kubernetes is over a decade old, but there's still plenty of stuff going on in the ecosystem. What does the next era look like for teams that are trying to maintain code quality at that scale?

Shivay Lamba: as I mentioned, the number one thing for me is essentially having security, not as an afterthought, but as something that you should always keep in mind while you're trying to also ensure that the code quality maintains, because we are seeing day by day that there is a lot of slop being generated by AI. And if you're not careful enough, then there's a lot of tech debt inversely added to your team. So at the same time, you are reducing the overall time it takes for shipping features in production. But on the other side of things, if you're not doing it well, then you're having a lot of tech debt because of DGI slot. And that also then comes with you risking with security side because there might be some unknown constant server. npm packages or code packages that you might not be aware or might not be even approved in your company's system and if you don't have like strong rules against that you could very well get risk of getting attacked right or like let's say if you don't have strong guardrails for like being able to push unoptimized YAML or Helm code changes that were made by AI so having like stronger guardrails is extremely important. And I feel that is something that teams will have to maintain.

Bart Farrell: Shivay, what are you focused on building or solving next?

Shivay Lamba: I'm kind of working on the intersection of AI agents and how MCP security works. So a lot of the points that I mentioned about security, how do you ensure that only specific tools are called? And especially if you look at intent-based access to tool calls for agents, right? Because by default, if you're giving access to agents to everything, you know, that's probably going to be a disaster. But rather, you look at who is the requester, and then you look at the permissions of the requester. And then you ensure that based on the auth of the user, you only permit or you give permissions to the agents to only the specific tools that they're supposed to use, right for that particular task. And even during the time where there might be multiple tool calls happening, one in a single request. you look at okay like what is the scope of the request at that particular level or that particular step and you're constantly re-evaluating what set of permissions you want that particular agent to do. And at the same time, have good quality. So that intent-based discovery of what tool calls need to be made. And on the other hand, I'm also doing some research where you can basically say that if an agent is trying to make a request, you could basically run some heuristic tests on top of it. to see that whether maybe, okay, is that particular tool call that the AI agent is suggesting, is it malicious enough? So you'll have like a data set of such tests that you could compare your tool call result against. this time working mainly on that particular intersection of MCP security, which I feel is also like very relevant when it comes to code quality and code reviews.

Bart Farrell: And if people want to get in touch with you or follow your work, what's the best way to do that?

Shivay Lamba: I'm fairly active on both LinkedIn. You can just search for Shivay Lamba. I have a fairly unique name. And on Twitter, I'm HowDeveloped.

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via