Kubernetes complexity, cost optimization, and emerging tools
Oct 17, 2025
In this interview, William Denniss, Group Product Manager for GKE Autopilot at Google Cloud, discusses:
Emerging Kubernetes features to watch: In-place pod upgrades for dynamic resource updates, pod-level resource requests to simplify right-sizing, and the growing ecosystem of LLM and inference gateway components for AI workloads
Cost optimization through intelligent autoscaling: Using Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA) to align resource allocation with actual usage
Treating clusters as "cattle, not pets": Managing multiple clusters through fleet automation rather than individual tuning, balancing the operational overhead of many clusters against the single point of failure risk of massive multi-tenant deployments
Relevant links
Transcription
Bart: So, first things first: who are you, what's your role, and where do you work?
William: Hi everyone, I'm William Denniss. My role is a product manager, and I work at Google on Google Kubernetes Engine.
Bart: Now, what are three Kubernetes emerging tools that you are keeping an eye on?
William: I'm really keeping an eye on three areas. I'm most excited about in-place pod upgrades, which is the ability to update the resources of a pod after it's been scheduled. This ties into things like VPA and adds more dynamism to the process. It's in beta now, and I'm excited to see it roll out.
I'm also interested in pod-level resource requests—the ability to specify memory and CPU at a whole pod level rather than individual containers. This takes a lot of the drudgery out of right-sizing these things. I hope it moves to general availability soon.
Finally, since we're in the age of AI, I'm keenly looking at LLM and inference gateway components and seeing how they'll be adopted.
Bart: So, to dig into some of the topics that have come up in our podcast episodes, our guest Mac believes the designers at Kubernetes didn't set out to build an overcomplicated piece of software, but rather it grew organically with hard-won knowledge baked into the codebase. How do you view the complexity versus capability trade-off in Kubernetes?
William: I really like this question because it comes up often. I see complaints and memes about Kubernetes being complex. The goal here is a professional tool that helps us get things done. It's clear the market has chosen Kubernetes, and there's a reason for its popularity: it delivers on people's needs.
If you simplify Kubernetes, you risk removing critical components people require. For example, you might suggest getting rid of persistent volumes and using an object store instead. But what about databases or legacy systems that need a disk?
The complexity exists to deliver value. People are using Kubernetes because it helps them accomplish their tasks. Instead of focusing on its overall complexity, we should consider how to make it easier to learn.
To get started, you only need to learn two objects: deployment and service. Just run a simple web service, create two config files, and you're going. Don't try to learn everything upfront like pod spread topology. Start with a simple example, run a container on the internet, and build your knowledge based on specific requirements.
Approach Kubernetes not by criticizing its complexity, but by learning it piece by piece.
Bart: Another podcast guest, Mark, thinks that with Kubernetes, it's quite easy to pay more than necessary because you pay for allocated or provisioned infrastructure—machines you start that are often underused. What strategies do you use to optimize Kubernetes?
Key potential optimizations I've identified include:
Implementing Vertical Pod Autoscaling (VPA)
Leveraging Google Kubernetes Engine (GKE) Autopilot for more efficient resource management
William: For starters, I recommend tools like horizontal pod autoscaling (HPA) and vertical pod autoscaling (VPA). The scheduler allocates resources based on the requests in the pod—the memory, CPU, and GPU that the pod is requesting. You can do bursting and fine-tune resources, but if your resources are out of alignment, you'll end up paying more than necessary.
One cool feature of VPA that everyone should use is its advisory mode. VPA can resize the resources of containers, but before IPU technology, it would restart the pod with potentially undesirable side effects. In advisory mode, it writes data to the Kubernetes resource indicating what the system thinks the container needs. Particularly on GKE, it's essentially free to get this data, so why not turn it on?
Additionally, ensure you're using a platform with auto-scaling at the platform level. You want HPA and VPA right-sizing the pods, with the platform also managing resources dynamically. My team has been intensely focused on creating a platform where you only pay for the resources your pods actually need.
Bart: Our guest believes many people initially think they need their own cluster because they're scared of multi-tenancy, but they learn that the operational overhead of maintaining multiple clusters is significant. What's your experience with teams being scared of multi-tenancy versus dealing with operational overhead?
William: Clusters should be treated like cattle, not pets. This analogy means you don't want to get to know any individual clusters too well. You want to treat them as a group. The overhead becomes significant when you spend time individually tuning clusters.
I would have the same advice whether you're managing five or a thousand clusters: ideally, you're treating them all the same. If you can set everything up automatically and methodically—using fleet management at the platform level, setting up CI-CD systems to coordinate—most of that overhead goes away. This approach also puts you in a stronger operational stance.
My product can scale to 15,000 nodes and even 65,000 nodes in some configurations. The temptation might be to create a massive cluster, but this capability is mostly built for massive AI training jobs that need to coexist. Even if it can scale so large, it's not always right to pack everything into one cluster.
From my perspective, there's also a single point of failure risk. In a multi-tenancy environment, if you have 50 tenants in one cluster and the cluster has a problem, then 50 people are affected. There's a balance—I'm not necessarily advocating one cluster per tenant if you have many tenants—but don't be afraid of multiple clusters. If you're scared of multiple clusters, it might indicate you haven't set up fleet automation effectively.
Bart: And what's next for William?
William: I wrote a book on Kubernetes called "Kubernetes for Developers". We'll be giving away copies at KubeCon. You can meet me on the show floor:
Tuesday, November 11th at 3:45 PM
Wednesday, November 12th at 3:30 PM
We'll have physical copies and PDFs available if you don't want to carry a book in your backpack.
I work on the Autopilot product for GKE. We've been expanding the product recently and building what we call a container-optimized compute platform, which is a new way to run pods. I've been focused on improving this for our users.
Bart: If people want to get in touch with you, what's the best way to do that?
William: The best place to reach me is on Twitter or X. My username is just my name, William Denniss (with two Ns and two Ss). The link might be on the podcast as well. You can always DM me on Twitter or LinkedIn. I check Twitter more frequently.
Bart: Well, William, thanks so much for your time. I look forward to seeing you in Atlanta. Take care.
William: Thanks, Bart. Appreciate it.