Komodor announces a new approach to troubleshooting Kubernetes add-ons

Guest:

Itiel Shwartz

Komodor has announced comprehensive support for Kubernetes add-ons, extending their troubleshooting platform to monitor critical components like CoreDNS, Autoscaler, and Cert Manager.

Platform engineers should note that this solution addresses a significant pain point: the complex task of monitoring and troubleshooting multiple add-ons essential for production-grade Kubernetes infrastructure.

The platform's unique approach consolidates all add-on-related data into one coherent view, transforming what used to be a time-consuming process involving 10-20 different tools into a streamlined experience.

Read the full announcement

Relevant links

Transcription

Bart: Who are you? What's your role? And where do you work?

Itiel: It's Itiel Shwartz, CTO and co-founder of Komodor, and I work for Komodor.

Bart: What do you want to share with us today?

Itiel: We're here at KubeCon, and the vibe is quite good. Unlike yesterday, when it was snowing in Utah, the weather is nice now. We just announced the support of Kubernetes add-ons in Komodor. For platform engineers, it's clear that Kubernetes doesn't end with nodes and pods. To have a production-scale Kubernetes infrastructure, you need to install a bunch of core services or add-ons. CoreDNS, Autoscaler, Karpenter, External DNS, Cert Manager, Istio, Cilium - all of these can and will impact the rest of your system.

In Komodor, we extended our base platform, which allows you to troubleshoot and diagnose issues in Kubernetes, to support these add-ons. This means we can help you understand when one of these add-ons is misbehaving, detect issues before they happen, and use our rule-based engineering capabilities to help solve the problems once they occur. I think this is one of the biggest announcements Komodor has made in recent years, and it's just the start of the journey.

Currently, we support 10 add-ons, but as everyone knows, more tools are being developed in Kubernetes. We expect this number to increase to around 40 by the end of 2025. If you have a specific problem with one of your add-ons, I'd be happy if you try Komodor to see if we have a solution or to ask for add-ons that we don't support yet.

Bart: For these particular add-ons, what pain points, what problems are you tackling with them?

Itiel: So it's a combination of detection of problems. For example, if your certificate is failing to renew, it's not something that traditional APM will catch. You need something different that will monitor CRDs, understand that they are misbehaving, and help you solve them. Detection is a huge part of it. The second part is around investigation. Because everything is tied up - the cert manager is being impacted by External DNS, which is being impacted by NGINX ingress - everything is tied up. The ability to do those jumps between different sources to find the root cause is super tricky. We take all of those different data points into one coherent view. Could you share?

Bart: Can you tell me a little bit about the before and after of this announcement?

Itiel: Before the announcement, when people wanted to understand what was happening with one of their add-ons, they went to the pod and tried to figure out why the Kubernetes Autoscaler was getting out of control. This was possible, but it was hard and time-consuming. They were only looking at the specific pod, the compute layer, not the CRDs it was responsible for. Our users had to go to 10 or 20 different places to gather all the relevant information and solve the problem. Now, they can go to the page called Search Manager or Autoscaler, where they see all the different CRDs, data, and operations in one place, and are able to solve problems without needing a lot of context switching and experience.

Bart: This add-on, are these add-ons open source and part of the CNCF landscape?

Itiel: I think most of them, around 95%, are open source. I'm not sure if most of them are part of the CNCF landscape, but almost all of them are open source. In the end, because of Helm, it's easy to install another add-on. The community is really active, with hundreds of different open source tools available. Most of them are open source.

Bart: Tell me more about Komodor's business model.

Itiel: So, what we do is help companies overcome the challenges of Kubernetes at scale. This means we help both platform engineers and infrastructure engineers, who are responsible for managing all those clusters. On the other side, we empower developers by providing them with the relevant tools needed to solve issues on their own. This is quite unique and something that is really missing in the ecosystem, related to Kubernetes.

Bart: Who are your main competitors?

Itiel: There is not a lot of direct competition. I will say that [APM](What is APM? Is it Application Performance Monitoring?) vendors claim to have some of the capabilities, but everyone who uses APM knows they are not really in the Kubernetes game. Even [Datadog](Should we link to Datadog?) and [New Relic](Should we link to New Relic?) had announcements similar to Komodor's three years ago. It is becoming more trendy to try and help troubleshoot Kubernetes, and it's mostly companies trying to build it themselves, and a lot of the time failing.

Bart: And what makes Komodor different? I know you touched on some of those points, but in terms of your approach, what makes you different from some of the competitors that are on the market?

Itiel: We only do Kubernetes. It's essential to note that if you have a problem with ECS, serverless, or bare metal, we don't provide any value. We're focused on Kubernetes, simplifying it, and troubleshooting for developers, as well as helping operations teams. We know that operations teams are outnumbered 10 to 1, and we want to help those teams be much more efficient.

Bart: What can we expect next from Komodor?

Itiel: We have a lot of things in the pipeline. The biggest area of focus will be around [fleet management](Could you please provide more context or clarify what is meant by "fleet management" in this context?). We see IoT or companies managing hundreds of different clusters as something that is gaining popularity. As always, we're trying our best to help mega enterprises manage and overcome those scaling problems in private environments.