Making Kubernetes more accessible: from multi-cluster to platform engineering

Guest:

Nicholas Eberts

In this interview, Nicholas Eberts, Product Manager at Google's GKE team, discusses:

How platform engineering isn't a new concept but rather a continuation of established practices — highlighting that even simple abstractions like Helm values files are forms of platform engineering that make Kubernetes more accessible.
The vision for truly interchangeable infrastructure through tools like multi-cluster gateways and cross-cluster service discovery, moving away from treating clusters as precious resources.
Why scaling to zero is crucial for modern workloads and how improving HPA capabilities could match the functionality of tools like KEDA without requiring additional components.

Relevant links

Transcription

Bart: Who are you? What's your role and where do you work?

Nicholas: Hi, I'm Nick Eberts. I'm a PM at Google on the GKE team and I work on fleets and teams.

Bart: What are three Kubernetes emerging tools that you're keeping an eye on?

Nicholas: I'm definitely interested in the platform engineering space, specifically in making the UX for developers easier to consume Kubernetes. I'm interested in working on normalizing fleets for all of our customers, and contributing to the upstream multi-cluster SIG cluster profile initiative. Additionally, I think getting HPA to scale to zero is super important, especially in the AI/ML influencing space.

Bart: Can I do a bonus question? You tweeted not that long ago that cattle versus pets is something we've been doing for a long time, but I think you meant to say platform engineering. Would you care to elaborate on that?

Nicholas: Platform engineering is a fancy word for something I've been involved with since Kubernetes was born. I'm sure this concept was established even earlier than that. It's just the idea that the abstraction or Kubernetes itself might not be the right abstraction for every user. If you're shipping a Helm values file to an end user, you're sort of giving them a platform. We've been doing this for a real long time.

Bart: Questions responding to our podcast? On the subject of multi-tenancy, our guest Artem shared that between a single cluster with multiple environments and dedicated clusters per environment, the latter is easier to manage for a small team. When you share a cluster with multiple teams, should you share a single cluster or offer a dedicated cluster per tenant? What does your choice depend on?

Nicholas: I don't want that choice to have to be so apparent. At Google, we built a thing called Teams, which is tenancy as a service. The idea here is that we're abstracting that decision for later. So, if today, single-cluster tenancy or multi-cluster single tenancy is the best solution for you, that's good. However, if that changes over time, it should be a simple change that you make to the binding of a team. I'm not going to say which one is right, because that really depends on your use case. I don't think there's a blanket answer, even within one organization. I think some clusters will be single-tenant, and then you'll have multi-tenant clusters on the side that are running smaller, less business-critical applications.

Bart: Related to the topic of auto-scaling and KEDA specifically, our guest Jorge argued that Kubernetes-based scaling solutions like KEDA offer advantages over traditional monitoring tools like Prometheus, especially regarding responsiveness. How do you auto-scale workloads in Kubernetes? What metrics do you use, and are there any tips and tricks that people should know?

Nicholas: So, that's an interesting question. Auto-scaling definitely has latency when using external metrics adapter. Using an internal custom metrics adapter, you can hit a local Prometheus endpoint. However, I think the more important question is, how can I actually scale to zero? What KEDA gives me is the ability to scale to zero, which I currently cannot do with the HPA. While the HPA can technically scale to zero, a proxy is needed to get it from zero to one. I'm more interested in focusing on normalizing getting scale to zero with the HPA. I think that's where we need to take the upstream project.

Bart: Cattle versus pets. Infrastructure as code. Dan Garfield, who's probably standing right over there.

Nicholas: I was just hanging out with him.

Bart: Shared that your cluster only feels real once you set up ingress and DNS. Before that, it was just a playground where it didn't matter when stuff broke.

Nicholas: I'm totally in the team of fungible clusters. So, the cluster shouldn't be super important; it should have a shape or an identity. Once you bring that cluster into the world, you give it a label and it knows what job it needs to do. However, there are a lot of required components that need to work for that to actually be feasible. You need multi-cluster gateways, load balancers that can go across multiple clusters, and service discovery across multiple clusters. These are all tools that we're working on bringing to the upstream, but already work on GKE. The idea is that you can use a multi-cluster gateway or multi-cluster services to abstract the cluster, so that you can add a cluster, and if it has the right label, everything comes up to state. To Dan's point, internet access or access from outside the cluster is just there because the load balancer already existed. I think it's super important to get these multi-cluster tools more accessible to everyone, all the end users out there for Kubernetes.

Bart: Kubernetes turned 10 years old this year. What should we expect in the next 10 years to come?

Nicholas: More Kubernetes. I think we'll continue to see the emergence of very specific platforms that are good for a specific type of workload. The brilliance of Kubernetes is that it can run a plethora of different types of workloads, and that may not require the same user experience for every single end user. The more that I can just go out there and buy a canned solution, the more successful our enterprise customers will be. I look forward to seeing very point solutions for a platform-type feel for a specific workload type on top of Kubernetes.

Bart: If you had to choose, what's your least favorite Kubernetes feature?

Nicholas: Long-term support. We don't need it. I don't think that we should do it. We're probably going to get forced to, but I do believe that we should make updating and upgrading easier so customers don't need to worry about long-term support. The analogy is that we're just making a bigger cup with a slow leak of water - the cup gets bigger, and then after one or two years, it's still going to start to overflow. It's a one-time pass. So I don't think LTS is the answer for those customers out there.

Bart: What's next for you?

Nicholas: I am going to KubeJam to play guitar with you, my friend.

Bart: What about professionally?

Nicholas: I have no professional ambition other than to make multi-cluster stuff work for the community.

Bart: How can people get in touch with you?

Nicholas: They can get me on Twitter. I don't actually remember what my handle is. Hopefully, you can put a little bird.

Podcast episodes mentioned in this interview

Making autoscaling dead simple in Kubernetes: KEDA
with Jorge Turrado
Surviving multi-tenancy in Kubernetes: lessons learned
with Artem Lajko
Clusters are cattle until you deploy ingress
with Dan Garfield