Optimizing Kubernetes: from resource allocation to autoscaling
In this interview, Nicholas Walker, Director of Product at StormForge, discusses:
How emerging tools like vCluster, NOS, and Prometheus 3.0 address key challenges in development workflows, GPU optimization, and monitoring efficiency.
The evolution from static resource allocation using Helm defaults to automated rightsizing for improved cluster utilization.
Modern approaches to cluster autoscaling with Karpenter and KEDA, highlighting how node consolidation, spot instances, and custom metrics can lead to 40% cost savings.
Relevant links
Transcription
Bart: Who are you, what's your role, and who do you work for?
Nicholas: I'm Nick Walker. I'm the Director of Product at StormForge, where we do workload rightsizing.
Bart: What are three Kubernetes emerging tools that you are keeping an eye on?
Nicholas: I'm keeping an eye on vCluster, NCCL-Over-Substrate (NOS), and Prometheus 3.0. I use vCluster myself for development, as it allows me to set up demos with multiple clusters and different workloads. This is much easier than waiting for separate clusters to spin up, as vCluster only takes a minute or two.
I'm also interested in NCCL-Over-Substrate (NOS), a tool that enables GPU fractionalization or partitioning. This is particularly relevant for AI and ML workloads running on GPUs, which are often the most expensive workloads. We're looking at NOS to optimize GPU workloads and determine the best approach to fractionalization or partitioning.
Lastly, I'm following Prometheus 3.0, which offers significant performance improvements. The 3.0 release is expected to reduce resource usage by about 50% and decrease the amount of data sent over the wire by the same amount. This will be beneficial for our platform and the broader community, as Prometheus is widely used for monitoring.
Bart: Now, on the side of really thinking about governance, requests, and limits. One of our podcast guests, Alexandre, stated that being conservative on requests can be painful in large clusters. How do you approach resource allocation in environments with many uncontrolled workloads?
Nicholas: I have two approaches to consider: a free, automated approach and a more comprehensive automated approach. Before automation, I often see people setting their requests and limits as defaults in a Helm template. I think we can be more aggressive as a community by bringing those default requests and limits down in the Helm template. This way, when you deploy hundreds or thousands of them in your cluster, you'll get better overall utilization. The ones that exceed requests will likely need to be increased, but that's still a lot of toil. I think we need to move towards an automated approach where a tool analyzes usage behind the scenes and provides recommended requests and limits. That's basically what we do at StormForge.
Bart: We're thinking about autoscaling, particularly Karpenter. Our podcast guest, Gazal, prefers Karpenter over Cluster Autoscaler for cluster autoscaling, highlighting its benefits and reliability. Gazal and his team use Karpenter to consolidate workloads and save around 40% of their cloud bill, likely by leveraging Spot instances and Node consolidation. Karpenter was donated by AWS to the Kubernetes project. How do you think this will shape the future of autoscaling in Kubernetes?
Nicholas: The 40% number we're hearing for Karpenter is interesting. We see that as well. About half of those savings usually come from the improved Node consolidation that Karpenter does. The other half comes from the increased Spot instances usage that you can get from adopting Karpenter, as it makes it safer and more reliable to use Spot instances in a cluster. As for how the donation of the Karpenter project will affect it going forward, we've now seen that it's been picked up by another cloud provider who's also using Karpenter. I think this will make Karpenter a more de facto standard for Cluster Autoscaler going forward. The fact that it makes cluster autoscaling more native in the cloud will be a positive for the community going forward.
Bart: Taking autoscaling one step further, related to the KEDA project, our podcast guest Jorge argued that Kubernetes-based scaling solutions like KEDA offer advantages over traditional monitoring methods like Prometheus, especially regarding responsiveness. How do you autoscale workloads in Kubernetes? What metrics do you use? And are there any tips and tricks that people should know?
Nicholas: At StormForge, we actually use KEDA in the background. We horizontally scale our machine worker pools based on a queue of recommendations coming into our platform. As for tips, I think people should look to use custom metrics for their Horizontal Pod Autoscaler. A lot of people use CPU utilization, which is easy to use out of the box with the HPA, but I would argue that it's a lagging indicator. It's already usage that you have on your application. We can move towards using custom metrics from a queue, for example, which is more of a predictive indicator of how much work is coming into your application, making your scaling more reliable.
Bart: Kubernetes turned 10 years old this year. What should we expect in the next 10 years to come?
Nicholas: I hope we can see Kubernetes get a little more boring over the next 10 years. We're releasing three times a year, and people have to take all these upgrades. I hope the upgrades get easier and that we get more robustness, security, and a little less of these major changes that make it more difficult for people to take the upgrades.
Bart: What's next for you?
Nicholas: Next, leading into KubeFM, I've been looking at Java workload optimization and talking to a lot of people about the issues they've had trying to right-size Java workloads. That's a particular interest for me going forward, just understanding the nuances of how to optimize a Java workload and seeing how we can help with that. How can people get in touch with you? You can find us at StormForge. If you want to find me, I'm Nick Walker, StormForge on LinkedIn.