Resource optimization and automation in Kubernetes

Guest:

Eli Birger

This interview explores how organizations can optimize Kubernetes resources while maintaining service reliability.

Eli Birger, CTO and Co-founder of PerfectScale, discusses:

Why 30-60% of Kubernetes resources are wasted while 15-20% of workloads experience runtime issues
How to achieve the right balance between cost optimization and maintaining SLAs in large clusters
The future of Kubernetes optimization, particularly in GPU sharing and achieving serverless-like efficiency

Relevant links

Transcription

Bart: Hi, my name is Eli Birger. I work at PerfectScale, and my role is to help companies with their Kubernetes deployments.

Eli: My name is Eli Birger. I'm CTO and co-founder of PerfectScale. Prior to that, I've spent many years managing platform, DevOps, and infra teams, and I've built large-scale systems. I'm very passionate about performance and resiliency, and efficiency in general. This is why we built PerfectScale.

Bart: What are three emerging Kubernetes tools that you are keeping an eye on?

Eli: The first one is Dapr. The second one is obviously Argo CD, the amazing one, and Prometheus, which helps everyone.

Bart: On the subject of over-provisioning, one of our podcast guests, Alexandre, wrote an article about over-provisioning. What strategies have you found effective in controlling over-provisioning in large Kubernetes clusters, considering factors such as Resource requests and limits to meet the required SLA (Service Level Agreement), and tools like Karpenter for node provisioning?

Eli: Everything starts with [observability](no link provided for observability, please provide one). However, you must have dedicated tools for that. Otherwise, you have an ocean of [metrics](no link provided for metrics, please provide one, possibly related to Prometheus) from ephemeral containers running on ephemeral machines, making it significant work to match everything together. What we are doing in this field is helping you get the answers. You have all the data, but what you're looking for is the answer to what you need to do now, possibly related to Resource requests and limits or SLA (Service Level Agreement), at PerfectScale.

Bart: Governance, requests, and limits. Alexandre stated that being conservative on Resource requests and limits can be painful in large clusters. How do you approach resource allocation in environments with many uncontrolled workloads?

Eli: Our automation moves uncontrolled workloads into controlled workloads. Having the Resource requests and limits set correctly is crucial. What we see across the board is that around 30 to 60% of resources are completely wasted, while at the same time, in the same clusters, 15-20% of workloads experience runtime issues like out-of-memory, eviction, throttling, often leading to loss of performance or broken SLA (Service Level Agreement).

Bart: Alexandre expressed that having an automated mechanism is better than enforcing processes. What automation tools or approaches do you recommend for managing Kubernetes resources, such as Resource requests and limits, and do tools like Argo CD, Karpenter, or Dapr come into play, considering the company PerfectScale you work for, and how they can help achieve a good SLA (Service Level Agreement)?

Eli: I'm recommending our tool because it takes into account a lot of different parameters. The cost savings is not the only consideration; ensuring your cluster runs smoothly and meets your SLA (Service Level Agreement) is more important. This requires having enough Resource requests and limits, which is a priority before reducing unneeded ones. Reducing resources is also crucial because it saves money and decreases CO2 emissions. Ultimately, reducing emissions is important because we all share the same planet. Our tool is from PerfectScale.

Bart: Kubernetes turned 10 years old this year. What can we expect in the next 10 years?

Eli: In the next 10 years, the most important development will be the ability to fractionally use GPUs. Currently, GPUs are pretty much un-virtualized and used in a row, which does not make sense. The second key area is better efficiency for infrastructure. Today, auto-scaling is problematic because it is very slow. When reacting to high spikes quickly, Kubernetes does not provide the necessary efficiency, similar to what is expected from serverless functions like Lambda, in terms of Resource requests and limits. What I would expect is efficiency as efficient as Lambda, while still maintaining a good SLA (Service Level Agreement).

Bart: What's next for you?

Eli: We are working on GPUs, which is a big area where we see potential for improvement across the board. More and more companies are using GPUs, and this is a huge area for waste reduction and performance improvement. We are also moving towards improving Karpenter. Karpenter is great and has shifted the entire optimization world to open source with great tools, but it is still reactive. We would like to add more logic to it, so we will definitely improve this area.

Bart: How can people get in touch with you?

Eli: So, PerfectScale. I'm available on LinkedIn. We'll be happy to chat about optimization, efficiency, and improvements.

Podcast episodes mentioned in this interview

Configuring requests & limits with the HPA at scale

with Alexandre Souza