Hybrid Kubernetes nodes: identity, upgrades, and connectivity

Dec 3, 2025

Guest:

Shyam Jeedigunta

In this interview, Shyam Jeedigunta, Principal Engineer at AWS, discusses:

Secure node onboarding for hybrid clusters - How to handle identity, certificate issuance, and trust when registering nodes from different infrastructure providers
Managing upgrades and configuration drift across distributed nodes - Strategies for handling patches and upgrades when some nodes come from automated cloud infrastructure, while others run on self-managed hardware
Network connectivity patterns for hybrid deployments - Approaches to ensure reliable control plane connectivity and service communication for worker nodes running outside the core cluster network

Relevant links

Transcription

Shyam: I'm Shyam. I'm a principal engineer at AWS, and I've been a longtime member of the community for close to a decade. As you can see, I'm wearing the 110 shirt from a while ago.

Bart: Now, what are three emerging Kubernetes tools that you are keeping an eye on?

Shyam: I follow a couple of notable projects. One is Kro, the Kubernetes Resource Orchestrator, an open-source project started at AWS that models resources as a group of connected things you need to deploy. You can replicate these and create application patterns. Another project I follow is Karpenter, which focuses on auto-scaling and node lifecycle management. It has been around for a while and gained substantial adoption. I'm looking forward to its next-generation auto-scaling solutions for AI/ML workloads.

Bart: When a Kubernetes cluster needs to register nodes that aren't created by the same infrastructure provider, what's the safest way to handle identity, certificate issuance, and trust during onboarding?

Shyam: When you bring up nodes, there are a few things you need to take care of. You need to give the nodes the right credentials to talk to the control plane. If you need to talk to other cloud services, for instance with AWS, you need IAM credentials and must figure out how to set those up.

For many scenarios, it helps to use established standardized methods that follow the same model you use in the cloud. With EKS hybrid nodes, you can run nodes on the edge, on-premise, or on other cloud providers. This approach helps bootstrap things like certificates for node credentials or use alternative authentication modes.

For example, with hybrid nodes, you can use Systems Manager activations, where you register the node with the cloud and receive an activation to obtain credentials. This greatly simplifies the bootstrap process, including credential rotation and revocation.

Bart: How do you manage upgrades, patches, and configuration drift when some Kubernetes nodes come from your own hardware or edge locations and others come from automated infrastructure?

Shyam: When managing hybrid nodes or nodes running outside of the cloud, you treat them as self-managed nodes, similar to running Kubernetes nodes in your own cluster. There are several considerations for managing these upgrades. Typically, you manage upgrades yourself, preferably in a rolling fashion with spare capacity. This allows you to bring up new nodes, gracefully drain old nodes, move pods, and terminate them.

If you don't have spare capacity and need to patch a component or node in place, you could use NodeADM. This is a command-line tool built by EKS for bootstrapping and managing operations like node upgrades. And it's the same tool that's also used inside the cloud for bootstrapping your EKS managed nodes. So it gives you a nice consistent experience across both.

Bart: If part of your Kubernetes worker pool runs outside the core cluster network, what patterns enable reliability, control plane connectivity, and service communication without breaking isolation or performance?

Shyam: There are multiple ways to configure connectivity between hybrid nodes and the Kubernetes control plane. If it's running inside an AWS VPC managed by EKS, you can reach directly through the public internet if you have network egress from your on-premise network. However, the recommended approach, especially for latency-sensitive scenarios, is to use private connectivity options like direct connect between your on-premise network and the AWS parent region VPC or site-to-site VPN.

It's crucial to ensure TLS is used everywhere possible and to restrict permissions. For instance, EKS offers a feature to restrict CIDR ranges or specific IP sets that are allowed to communicate when using public internet mode. You can limit connectivity to the absolute minimum set of IPs required.

Bart: Kubernetes turned 10 years old last year. What should we expect in the next 10 years?

Shyam: I've been fortunate to see most of the last 10 years. I'm getting close to a decade now in the Kubernetes community, and I've seen many trends. One consistent observation is that we kept raising abstractions, starting from infrastructure that soon became constructs for applications. From there, we've moved with projects like Kro to higher-level abstractions for applications, using Kubernetes as a better platform.

There's a whole wave of workloads that have just emerged. It's going to be very interesting because we'll need to add fine-grained capabilities for potentially esoteric problems, and this needs to be done without increasing the complexity of Kubernetes, which many users already struggle to understand.

It's going to be an interesting challenge for all of us ahead. I'll help chip away at these problems one by one and try to make users' lives easier.

People can get in touch with me at the AWS booth, in the Project Pavilion hallways, or message me on Slack in the Kubernetes workgroup. My Slack ID is Shyam JVS (S-H-Y-A-M JVS), or you can message me on LinkedIn.