AI SRE and Kubernetes Observability
Feb 26, 2026
Kubernetes has so many moving pieces that without observability, it's "on the verge of pointless" — so says Itiel Shwartz, Co-Founding CTO at Komodor.
In this interview, Itiel shares how AI SRE is reshaping observability, why his team spends 80% of development time on validation rather than building features, and why he sees Kubernetes becoming the control plane for all infrastructure.
In this interview:
Why AI SRE is the biggest trend in observability — and what "the max frontier" looks like
The fight between Cluster API and Crossplane for multi-cluster management
How Komodor validates AI features with LLM-as-a-judge, A-B testing, and benchmarking
AI in Kubernetes operations is inevitable — the question is who can actually deliver on the promise.
Relevant links
Transcription
Bart Farrell: All right, but first things first, who are you, what's your role, and where do you work?
Itiel Shwartz: I'm Itiel Shwartz, CTO at Komodor.
Bart Farrell: Fantastic. And Itiel, what are three emerging Kubernetes tools that you're keeping an eye on?
Itiel Shwartz: I think that AI SRE, GPU, and ML workload, and stateful application on top of Kubernetes.
Bart Farrell: Now, one of our podcast guests, Tanat, says the observability gain from Karpenter has been huge for their operations. He works at Adevinta. What observability improvements have been most valuable for your Kubernetes operations?
Itiel Shwartz: I think AI SRE is the biggest trend that we've seen in observability in general. Komodor is both an AI SRE company by itself, but we're also using other observability tools for APM. And I think in general, injecting AI into observability is the max frontier.
Bart Farrell: In addition to that, we interviewed about 50 Kubernetes engineers at the last KubeCon. And when we asked them, what would be the first tool that they would install on a blank Kubernetes cluster? People came back with either something GitOps related or something observability related. Why do you think that's the case?
Itiel Shwartz: I think that Kubernetes is so cumbersome and has so many moving pieces that without installing something that will allow you to understand what is happening, it's on the verge of like pointless. I am surprised that GitOps was together with observability. I would install like Komodor and observability tool as the first thing that I install on any cluster.
Bart Farrell: Another podcast guest of ours, Oleksii, closely monitors cluster API to manage multiple Kubernetes clusters from a central management cluster. How do you approach multi-cluster management? And do you think tools like cluster API will become the standard? Or will we see alternative patterns emerge?
Itiel Shwartz: Yeah, great question. So first of all, Komodor just finished investing quite a lot in cluster API management. So giving our customers the observability and detection, investigation, remediation for cluster API use cases. So I think it is emerging. I think there's a fight between cluster API to Crossplane. I'm not sure who is going to win that, but those are the two most dominant ways to manage clusters. All in all, I think Kubernetes can and will be a great control plane for the rest of the infrastructure. I think it's better than GitOps because GitOps has no state. Kubernetes is very stateful. So I think the future is there. I'm not sure if Cluster API, Crossplane, or maybe someone else will take the lead on that. Pulumi also has something in HashiCorp as well, so I'm not sure who will win that.
Bart Farrell: And I guess that we had named Mai stated that AI for Kubernetes operations is amazing, but not ready to be deployed in production without guardrails. And what guardrails do you think are necessary for AI tools?
Itiel Shwartz: A lot of evaluation and validation. So in Komodor, I just had a talk today with a couple of leaders in our space, the current space, and I told them that in Komodor for every new feature AI capability that we develop, we invest 10% of the time in developing the feature, 80% of the time in validation and evaluation. So it's using LLM as a judge, A-B testing, benchmarking. There's a whole suite of how do I make sure my AI works as expected. And I think today in like modern complex application and every AI on top of Core IT is complicated. you have to invest most of your time in validation and have the internal confidence and then have the same confidence for your users in these two.
Bart Farrell: And looking towards next year, or wrapping up this year, getting close to 2026, what do you expect to be happening in the coming 12 months in terms of things we'll be seeing on Kubernetes? We expect more AI, we expect less AI. What do you think we can expect next?
Itiel Shwartz: More AI. Like everyone who was just back from AWS re:Invent, and AI is becoming a thing. So I think all of the different companies are going to start playing with AI, using AI. And I think there's no way around it. I think that's going to be the most dominant trend. I think there's a lot of snake oil sellers in this space, a lot of people who claim to do a lot of things but don't really deliver. But all in all, I think the industry is going to move to that direction. I think it's true both for the cloud providers and Azure and Google. AWS already offers some AI SRE. It's true for the legacy players, like Datadog or Dynatrace, and also for the upcoming, like Komodor. So I think that's the most interesting trend. And if people want to get in touch with you, what's the best way to do that? LinkedIn, I think it's the easiest way, and also on Twitter, but I'm much more like a LinkedIn person on Twitter. So LinkedIn, Twitter, or you can shoot me up in an email directly.



