Kubernetes emerging tools, resource optimization, and community building
In this interview, Rafael Brito, Principal Engineer at StormForge, discusses:
Three emerging Kubernetes technologies to watch: in-place pod resizing in Kubernetes 1.33, KEDA for event-driven autoscaling beyond traditional HPA, and Dynamic Resource Allocation (DRA) for managing specialized hardware like GPUs through new resource claim constructs
Strategies for reducing resource waste and optimizing costs: addressing the common problem of over-provisioned workloads through proper requests and limits tuning, using machine learning approaches to calculate workload needs, and bridging the gap between developers, platform engineers, and FinOps teams
Community building and knowledge democratization: co-authoring the second edition of "ACE the CKA" to help practitioners break into cloud-native technologies, serving as CNCF Technical Advisory Group Operational Resilience Co-Chair, and fostering Kubernetes adoption across Latin America through KCD events
Relevant links
Transcription
Bart: Rafa works for StormForge, a company in the Kubernetes and cloud-native ecosystem.
Rafael: Hi Bart, my name is Rafael Brito. I work for StormForge as a principal engineer. StormForge is a company using machine learning to optimize Kubernetes. In my background, I came as an engineering manager from Grids and Kubernetes at a big bank. When I left the bank, I had a mission to make Kubernetes easier and less expensive for everybody, from small users to enterprise users.
Bart: I notice that the transcript snippet is incomplete. It seems to be a partial response to a question about emerging Kubernetes tools. Without the full context of Rafael's answer, I cannot confidently hyperlink specific terms.
Would you be able to provide the complete response from Rafael about the three emerging Kubernetes tools he's keeping an eye on?
Rafael: First is arguably the biggest feature of Kubernetes 1.33: in-place pod resizing, which allows you to patch pods instead of having them restart. I wrote a blog post comparing pod patching to UDP versus patching a workload as TCP. This means that when you patch a pod, you're not guaranteed the resources you want if you increase them. On the other hand, when you patch workloads in the old-fashioned way, you start the pods and have guaranteed resources.
I declared in my blog post that these methods can and must be used together, with everyone needing a strategy. In short, I see pod patching as reactive during production hours when you need to increase CPU or memory resources. After hours, you can patch your workloads in the traditional manner.
The second technology is StormForge. Many users rely on Horizontal Pod Autoscaler (HPA) for CPU, but I prefer to use KEDA as much as possible. KEDA isn't new, but surprisingly, not many people are using it in production. We use it to scale our machine learning calculations based on RabbitMQ queue depth.
KEDA, which stands for Kubernetes Event-Driven Autoscaling, gives businesses the power to scale pods based on demand. Since 2020, KEDA has grown from 20-25 autoscalers to more than 70, demonstrating its increasing ability to handle external-driven autoscaling. As an engineer or manager, you should look at KEDA before using HPA with CPU.
The third technology is Dynamic Resource Allocation (DRA), which I'm not claiming to be an expert on but am closely studying. In Kubernetes 1.34, set to be released in a few weeks, DRA will be beta and enabled by default. It gives third parties the ability to configure and schedule resources such as GPUs, which is a significant change in how we manage resources.
Bart: Very good. We can double click on that precisely. One of our podcast guests, John McBride, mentioned how he saw version 1.32 helping to dynamically allocate drivers and resources to nodes without provider-specific plugins. How do you see Dynamic Resource Allocation (DRA) changing the way we manage specialized hardware like GPUs in Kubernetes?
Rafael: You're going to have plugins that let Kubelet observe and manage GPU allocation. You'll have a new built-in resource claim, request resource claim template, and you will define using CEL (a language) how resources that are not necessarily CPU and memory should be allocated to pods. This is brand new. I think NVIDIA has a GPU operator that takes advantage of these features.
We have new constructs like resource claim and resource claim template. I believe we will see more advanced ways to utilize these resources. For example, with GPUs, we can explore scaling based on GPU idle time. This is the number one concern for FinOps professionals managing large GPU fleets: how to extract maximum value when GPUs are idle.
I see these plugins working in a scheduled way, where you can configure and trigger batches when GPUs are idle, not necessarily using the Kubernetes scheduler. The background is about preventing computing resources from being idle, which ultimately leads to high costs. That's what I expect from Dynamic Resource Allocation (DRA). While it's too early to understand all the details, the NVIDIA GPU example shows they are already taking advantage of these new capabilities.
Bart: One of our other guests, Zain, observed that there's a lot of waste happening on CPU and GPU, and there's a significant opportunity to optimize this. What approaches do you take to reduce resource waste in your clusters?
Rafael: That's another issue people lose sleep over. When I left the bank, I saw that moving to Kubernetes does not necessarily mean you're going to save real dollars. By "real dollars," I mean actual cloud cost reductions. You gain many other advantages, but real money on the table is hard to see when you first move to Kubernetes.
Why is this? Kubernetes manages resources through primitives like requests and limits, which developers or platform engineers often set arbitrarily, neglect, or use default values. I have another talk about the importance of these settings being well-tuned because they impact two dimensions: cost and performance.
It's normal for people to over-provision. Nobody gets fired for requesting more CPU. Depending on the outage, if you under-request CPUs, you might at least get reprimanded. So, one strategy is always to have a mechanism to calculate each workload's needs.
This is typically a machine learning problem because human beings cannot scale. I managed 500 applications with a team of 15 engineers. Often, my day involved addressing slow or costly applications. We'd dedicate an engineer to determine the workload profile and set requests and limits, but we didn't scale effectively—only the most critical applications received attention.
At our company, we provide a service with a machine learning algorithm that calculates these needs. If you don't want our solution, use something—anything—but don't leave this neglected. It will compound and eventually bite you, either through silent resource exhaustion or unexpectedly high application costs.
Use something—VPA, your own calculations, P99, P95—but use something. It's much better than nothing.
Bart: Our guest thinks with Kubernetes, it's quite easy to pay more than necessary because you pay for allocated or provisioned infrastructure—machines you start that are often underused. What strategies do you use to optimize Kubernetes costs?
Key terms I've linked:
Kubernetes (twice, as it's the core technology being discussed)
Potential implicit links to FinOps strategies for cost optimization
Note: While the transcript doesn't explicitly mention specific cost optimization techniques, the context suggests potential strategies like Horizontal Pod Autoscaler (HPA) or Vertical Pod Autoscaler (VPA) could be relevant for addressing infrastructure underutilization.
Rafael: So that's another concept I've been discussing at KCDs. I organized KCDs in Brazil and Texas, and I just came back from Lima, Peru. One of the concepts I want to make crystal clear is metrics in Kubernetes. You have the node capacity, then you have allocatable, which is essentially the capacity minus what Kubelet needs. And there's allocated, which is when a pod is scheduled and grabs a specific amount of CPU or memory—that's the utilization.
These dimensions are important. When something is allocated, it's yours, and nobody can take it. It's part of the Linux kernel's CFS quota. The strategy is that you can add all the observability you want, but you must take actions. You need to investigate why you're running over-provisioned. It's like a dog with multiple owners that starves to death because nobody takes responsibility. Who's going to set up the proper request limits?
At StormForge, we sit in a triangle between the developer, platform engineering team, and FinOps. Sometimes we have to bring them together to develop a strategy to reduce costs. If you talk to developers, they depend on the platform engineering group. Platform engineers often manage a cluster with 100 users across 10 business units. They're like an HOA president—not typically in charge of each namespace or workload, as they can't risk breaking something and being held responsible.
Usually, these two entities don't pay the bill—that's FinOps' role. We position ourselves in the middle of this triangle to get everyone on the same page and roll out a cost-reduction strategy. As an engineering manager, what kept me up at night wasn't over-provisioned workloads, but under-provisioned ones that could suddenly fail due to lack of resources.
Every company transitioning to cloud-native must face these challenges as part of their day-two operations.
Bart: You've spent a lot of time focusing on this issue, traveling, and interacting with different folks in the ecosystem. When you're not doing that, you're also writing books. I understand that you recently wrote the second edition of "ACE the CKA". Can you tell me about the process that went into that and what you're trying to help people with in this book?
Rafael: I'm a firm believer that Kubernetes is still new, although we are 10 years into this journey. I have many friends and customers who use Kubernetes in a pragmatic way, not in the legacy approach.
This is the book—the first edition—and the main author is Chad Crow, my good friend. Hello, Chad. We are just finishing the second edition because the CKA exam has changed significantly and is much more challenging, in my opinion.
My motivation to write this book with Chad is to democratize Kubernetes knowledge. I still think there is a big psychological and cultural barrier to moving to Kubernetes. Many people who are not cloud-native yet think it's a cult. It's not a cult—it's what's happening. VMs and databases will always have their purpose, but time goes by, and we need to abstract more and more of those primitives.
That's why I co-wrote this book with Chad—to reach people who are still struggling to break into cloud-native technologies. These are the people who will go on to convince their bosses that Kubernetes is the future. Does it make sense?
Bart: I know those conversations have to happen. If people don't have a clear understanding of the value proposition of Kubernetes and cloud native, it's not going to get very far in making organizational changes. As you said, there's fear, doubt, and resistance. Cultural elements play a very strong role. I congratulate you and Chad for the work you've done. I look forward to seeing the second edition.
You've recently been elected as the CNCF Technical Advisory Group Operational Resilience Co-Chair. First, congratulations. For those unfamiliar, can you explain a little more about the things you'll be working on?
Rafael: CNCF has the TOC, which is the technical organization committee that defines the direction of the community and CNCF projects. The technical advisory groups are an extension of the TOC. They recently rebooted and have many technical advisory groups that have become quite granular. This reboot happened a couple months ago, and we're still going through this process.
Basically, we're bringing in practitioners, and each TAG has eight members. We're going to develop initiatives, listen to the community, and see what's important. Everyone is free to create an initiative and join these meetings to share their thoughts.
I'm coming from a volunteering philosophy: please come, ask what you want, but give a little bit of your time to make that happen. The call to action for everyone listening is to bear with us during this TAG reboot process. We want to reach out to everyone. We have a KubeCon session in North America where we'll explain what's going on.
I want your help to create initiatives or attach yourself to an initiative to see how we can improve the community. The TAG I'm working on is called Operational Resilience, which has a broad umbrella including cost management (related to FinOps), performance, observability, business and operational continuity, and data recovery.
If you're coming to KubeCon, come and talk to us and join our meetings. During this reboot, we want to listen to what people need. When you come, don't just complain about your challenges—we humbly ask that you help us because you'll be important in this process.
Bart: Like I said, it's an open call for participation. If you're not aware, it's very easy to reach out and get involved. Ask questions, be informed. Is there anything else, apart from writing books and being the co-chair, that's on your agenda in terms of what's next for you that we should know?
Rafael: As a community builder, I want to give a shout out to Latin America. We are bringing a lot of momentum there. I organized the KCD Brazil in March, just came back from Lima, and I'm going to Colombia in two weeks. I think there is a lot of movement happening in Latin America, and I really want to see it growing. I'm based in Austin, Texas, originally from Brazil. I manage the Austin Community Group in KCD, Texas. We just need more people who are trying to break the cloud-native bubble.
Bart: It's commendable work. I can proudly say that I've been involved in KCD in Latin America. I have the sticker from KCD Guatemala, which I attended earlier this year. I really enjoyed my experience. There's an amazing community of folks in different Latin American countries. Keep that in mind—these are people you can learn from and work with. Rafa, it was great talking to you. If people want to get in touch with you, what's the best way to do that?
Rafael: I'm very approachable. If you want to talk about anything related to CNCF, the book, or optimizing your Kubernetes clusters, just reach out to me on LinkedIn. Search for "Rafael Brito Kubernetes", and you'll find my profile. I am also active on the CNCF Slack under the username Brito-Rafa.
Bart: Perfect. Rafael, take care. Have a great day.
Rafael: I noticed that the transcript snippet is very short and doesn't contain much context. Could you provide more of the transcript or clarify what specific part you want me to hyperlink? The current text "Thank you, Bart." doesn't seem to require any hyperlinks based on the context.