Kubernetes auto-scaling strategies and emerging day-two operations trends
Dec 1, 2025
Kubernetes scaling and day-two operations continue to evolve as organizations seek to balance cost efficiency with reliability in production environments.
In this interview, Aviv Shukron, VP Product at Komodor, discusses:
Auto-scaling solution comparison and efficiency - Why Karpenter often outperforms traditional cluster auto-scalers and commercial solutions
Production scaling risks and observability requirements - How reliability and cost are two sides of the same coin, and why many organizations fail at auto-scaling
Kubernetes evolution toward simplified day-two operations - The shift from provisioning-focused tools to addressing operational complexity, with expectations for consolidating the overwhelming ecosystem of tools into more streamlined solutions
Transcription
Bart: First things first, who are you, what's your role, and where do you work?
Note: While there are no specific technical terms to hyperlink in this transcript, I noticed the company Komodor is mentioned in the context description. However, since the actual transcript doesn't contain the company name, I've kept the text as-is.
Aviv: ` HTML tag
Bart: Bart Farrell: What are three Kubernetes emerging tools that you are keeping an eye on?
Aviv: I think the three emerging trends are probably AI SRE, which now seems to be rising—very exciting stuff with a lot of new companies emerging in that space. The other thing is checkpointing and restoring of state, which is ramping up. That kind of leads to cost optimization, which is probably the first use case for many companies.
Bart: One of our guests, Niels, believes that Karpenter is way faster in scaling up and provides more flexibility in deciding node resources. What has been your experience with different auto-scaling solutions?
Aviv: There are multiple different auto-scalers: the traditional open-source ones like Cluster Auto-scaler, Karpenter, and vendor-specific solutions like Spot.io. At the end of the day, it all comes down to efficiency. We've seen Karpenter being as efficient as commercial solutions, with a very dynamic approach to scaling. Other tools also enhance proactive decision-making. I would say Karpenter is probably my choice.
Bart: Another guest, Thibault, warned that auto scaling needs to be finely tuned. If it starts to fail, you'll probably experience significant issues. How do you approach the risks of auto scaling in production?
Aviv: What people don't understand is that reliability and cost are two sides of the same coin, and you have to balance the two. We've seen many instances of people implementing auto scaling without understanding the reliability side of things. Often, scaling operations can bring down applications entirely, and they lack visibility into these situations. It's very important to have observability to ensure you can achieve the best of both worlds.
Bart: Another guest, Jorge, argued that Kubernetes-based scaling solutions like KEDA offer advantages over traditional monitoring tools like Prometheus, especially regarding responsiveness. How do you auto-scale workloads in Kubernetes? What metrics do you use? Are there any tips and tricks that people should know?
Aviv: I think when considering different metrics, it's more about creating a holistic scaling approach. How do you use Vertical Pod Autoscaler (VPA) and Horizontal Pod Autoscaler (HPA) together? How do you ensure you're handling spikes correctly without waste? It's about finding a more comprehensive strategy for your workloads. Obviously, you can choose different metrics like latency and CPU and memory, but that's not enough. It's more about choosing the right policy and strategy for your specific workloads.
Bart: Kubernetes turned 10 years old last year. What should we expect in the next 10 years?
Aviv: I think the first few years were about provisioning: how can we make provisioning easier, how can we spin up more environments? I think now the main problem is day two operations. I would expect more simplicity around that space, more specifically because there are so many ecosystem tools. How can we consolidate that? How can we manage that as an organization and actually get the benefits of Kubernetes and not just the toil?
Bart: What's next for you?
Aviv: We are moving towards more streamlined operations, which very much align with what we discussed. It's going to be more about automated root cause analysis, automated cost optimization, and essentially streamlining operations in the process.
Bart: Aviv Shukron can likely be contacted through Komodor, the company he works for. The specific contact method isn't detailed in the transcript, so additional context would be helpful.
Aviv: Check out the Komodor website and connect with me on LinkedIn. I'm always available.



