Kubernetes webhooks explained and Aspect Oriented Programming

Host:

Bart Farrell

Guest:

Gordon Myers

This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

This episode explores Admission Controllers and Webhooks with Gordon Myers, who shares his experience implementing webhook solutions in production. Gordon explains the lifecycle of Kubernetes API requests and how webhooks can intercept and modify resources before they are stored in etcd.

You will learn:

How the Kubernetes API processes requests through authentication, authorization, and Admission Controllers.
The difference between Validating and Mutating webhooks and how to implement them using JSON Patch.
Best practices for testing webhooks and avoiding common pitfalls that can break cluster deployments.
Real-world examples of webhook implementations, including injecting secrets from HashiCorp Vault into containers.

Relevant links

Transcription

Bart: In this episode of KubeFM, get ready to learn about Webhooks with our guest, Gordon Myers. Gordon explains how webhooks play a crucial role in Kubernetes API interaction. He also delves into JSON Patch and emphasizes the significance of thorough testing for webhooks, which is critical to avoid deployment failures. Gordon draws parallels between Aspect-Oriented Programming and webhooks, addressing challenges and security risks in webhook implementation. This episode is sponsored by LearnK8s. We all know that learning Kubernetes can be complicated, so LearnK8s offers online training as well as in-person courses, which are 60% practical and 40% theoretical. For more information, check out LearnK8s.io. Now, let's get into the episode. Gordon, can you tell me which three emerging Kubernetes tools you're keeping an eye on?

Gordon: Good question. The first one I would mention, although not emerging, is K9s, as it's critical to day-to-day operations. Anyone working with Kubernetes needs to have it. Another one is Kubeflow, a collection of utilities that help develop the entire MLOps lifecycle. It includes components like Argo workflows, which is a wrapper around Argo workflows, great for data engineering and container-based workflows. It also includes Jupyter notebooks hosted in the cluster and a host of other things. Kubeflow is fantastic. A newer, emerging project worth keeping an eye on is KTRS (Kube Tools Recommender System), a GitHub project that leverages LLMs to analyze logs and check the health of things. I was playing around with it last night, and it started hallucinating a bunch of things, so I think it's still fairly immature, but it's interesting.

Bart: Can you explain a little bit more to our audience about who you are and what you do, and where you work?

Gordon: I'm Gordon Myers, a software engineer with 15 to 16 years of industry experience. You've caught me at an interesting inflection point. Until recently, I worked for a medium-sized American insurance company. The company itself isn't that important, but the department I led, the Data Science and Analytics Lab, or DSAL, was quite significant. We were an innovation lab chartered to explore the future of insuretech and answer the question of how our business could stay relevant and competitive 10, 20, or 30 years from now. We were given a lot of latitude to experiment, innovate, and push the envelope with new technologies. As a result, we were the first in the organization to move to the cloud and to use Kubernetes, among other firsts. Unfortunately, the department was recently let go, and I am now actively interviewing with several other companies. For the purposes of today's conversation, I'll focus on my experience at that group, as I was there for over six years.

Bart: Now, how did you get into cloud native?

Gordon: I started my career in 2008 working for a web development firm doing PHP development. At the time, cloud native was not widespread. I did that for a number of years, and the deployment process looked very different back then compared to now. Later, I went to work for a startup where we began to adopt cloud native technologies, leveraging tools like Elastic Beanstalk to deploy WAR files. When I joined DSAL, the team had already started containerizing all of our applications, but we were still using Amazon ECS for deployment. However, we quickly took an interest in Kubernetes and realized it was a much superior platform with great industry adoption, traction, and community support. As a team, we up-skilled together and decided to go all-in on Kubernetes, and we've never looked back since.

Bart: And deciding to go all in is no joke. It's an ecosystem that moves very quickly. What's your experience? How do you stay updated with all the changes going on in the Kubernetes and cloud-native ecosystem? Is it through blogs, podcasts, videos? What works best for you?

Gordon: It's kind of all of the above. I like attending tech conferences or watching the videos of them later. Meetup groups are also a great resource where others share their battle experience and advice. I also read tech blogs here and there. However, I am an experiential learner, so I do best when I'm trying it out myself.

Bart: Last but not least, if you could go back and give one career tip to your younger self, what would it be?

Gordon: I think it would be to attend more meetup groups. I'm a naturally somewhat introverted, shy person. And so I didn't do a lot of that until recent years. It's just being around other engineers who have experimented with it or are experimenting with things that is a really enriching environment. I would encourage my younger self to make more friends, essentially.

Bart: As part of our monthly content discovery, we found this article, Kubernetes Webhooks, Explained. The questions we're going to be asking are related to that. Before we dive into webhooks, let's talk about how the Kubernetes API works. Can you walk us through the lifecycle of a request in Kubernetes, from when it's received to when it's stored in etcd?

Gordon: So, the Kubernetes API can be accessed in a number of different ways. If you're using a command line tool like kubectl, that's going to be using the Kubernetes API under the hood. Similarly, if you're releasing a deployment through Helm, it works the same way. The process starts with authentication. The request you're making is authenticated. If you've looked at your kubectl config file, you've probably seen a long hex string called client certificate data and client key data. This is used to identify who you are and verify that you have the right to talk to the server in the first place. Once that's validated, it will then go through an authorization flow. Kubernetes has a Role-Based Access Control (RBAC) that defines what actions a user can perform. Your request is validated to ensure not only that you are who you say you are, but also that you're allowed to do the things you're requesting to do. Finally, it will pass through a series of Admission Controllers. This is what we're going to focus on today, because webhooks come into play as admission controllers. There are two different types of webhooks, which we'll get into: Validating Webhooks and Mutating Webhooks. Once all that handshake is completed, it will then persist the data in etcd, which is the persistent data store that Kubernetes uses under the hood.

Bart: For folks who aren't super familiar with Kubernetes Webhooks, Explained, could you give a quick explanation of what they are, and provide an example of how they might be used?

Gordon: A webhook is essentially a listener function that receives a pod specification when you're trying to launch or update a pod. For users not familiar, a pod is essentially just a container, an application that you're running, which can be either a long-lived web service or a run-to-completion job. When you launch that container, it has its own definition, which you've probably defined in a Dockerfile. There are two types of webhooks: Validating Webhooks and Mutating Webhooks. Validating Webhooks take a look at the pod you're trying to launch and will say yes or no, based on whatever proprietary logic you've come up with in that webhook. Mutating Webhooks, on the other hand, take a look at the pod specification and make last-minute changes to it, again, based on whatever logic you define. It's all about user-defined interactions in a consistent way using webhook listeners.

Bart: When a resource is submitted to the API, it's eventually sent to the Validating Webhooks or Mutating Webhooks. How does this work, and what should the response from the webhook be?

Gordon: The resource definitions in Kubernetes, which I frequently refer to as "dumb objects" in my blog post, are a declarative YAML language. This is similar to HTML or CSS, where the object definitions don't inherently do anything themselves. For example, you define a deployment, specifying the container image, environment variables, secrets, and so on. The webhook receives this same definition, which is not the application itself, but rather metadata about the application that's about to be run. Typically, this is represented in YAML syntax when defining objects for Kubernetes. YAML is mostly isomorphic with JSON, meaning you can translate back and forth from one to the other, although there are some edge cases.

When it comes to Kubernetes webhook configurations, watching a pod sends a JSON representation of that pod to your webhook function as a POST request in the post body. Your function can then inspect this and determine whether to admit the pod if it's a Validating Webhooks, or modify the pod definition if it's a Mutating Webhooks. In the case of a mutating webhook, it takes the JSON definition and changes it.

Bart: With the topic of JSON coming up, let's dive a little bit deeper into JSON Patch. How does that work in the context of Kubernetes Webhooks, Explained? Could you also provide an example?

Gordon: Kubernetes uses a standard format for configurations called JSON Patch, which is an industry-accepted standard for defining how to manipulate a JSON object into another JSON object. It does this with three operations: add, remove, and replace. At each dot notation path within your JSON, you can specify the operation you want to perform. For example, if you have a JSON list nested in your object, you can add an entry to the list, or if you have a key-value pair somewhere in your JSON, you can replace the value with something else. This provides a set of concrete, discrete operations that you can perform. You list out these operations, such as add, remove, and replace, and that is what your webhook returns to the server. Although, as I mentioned in my blog post, the particular way in which you encode that response can be a little janky and unexpected.

Bart: It seems like testing is a crucial aspect when implementing Webhooks. Why is thorough testing so important in this context?

Gordon: Let me tell you a little bit about the application I wrote and detailed in my blog post, as it leads into the question perfectly. We wrote a webhook to take secret values from HashiCorp Vault, which we had deployed in our environment, and inject them dynamically in memory, without relying on Kubernetes secrets themselves. When we built this, it was prior to Mac Chaffee writing his blog post about how secrets are actually fine. However, we were concerned that we had base64 encoded secrets, and since we were already using Vault, we might as well leverage it and come up with a way to inject those directly into the running application.

We wrote an application that does this by specifying an annotation on the pod to retrieve secrets from a particular path. When the application launches, it injects a special entry point script we developed, which launches the intended entry point as a subprocess with the secrets as environment variables. This seemed like a great idea, but we found many pitfalls and places where this could go awry. For instance, if your webhook returns a 500 error, the pod you intended to launch doesn't launch. Therefore, it really needs to be battle-tested and solid.

When developing these webhooks, which are a type of Mutating Webhooks or Validating Webhooks, it's essential to cover as many unit test cases as possible. While it's common to flush out some unit tests during development, shipping products often takes priority. However, when it comes to webhooks, which have far-reaching implications for the rest of your stack, that's the part you want to test the most.

Bart: In a previous episode, you mentioned the blog post by Mac Chaffee on writing secrets. In another episode, our guests Miguel and Thibault used a custom Mutating Webhooks to change the container image architecture and deploy workloads on ARM nodes. They suffered an outage when the webhook timed out. What kind of advice would you share in those scenarios?

Gordon: When the Validating Webhooks timed out, that's a really interesting application. It's fun seeing the creative ideas that people come up with using Validating Webhooks. I extend my sympathy to them because I've had cases in my job where other engineers were trying to launch pods and there was some error that I hadn't foreseen in the Validating Webhooks. It just grinds things to a halt, and then you have to focus on it quickly. As I said, it's about planning ahead and thinking of every different scenario. In that particular case, I wonder if you can fall back to temporarily disabling the Validating Webhooks while debugging, and then change your deployment specs manually to use the ARM architecture instead. It's something that's really hard to foresee, so my heart goes out to those guys.

Bart: In your article, you drew a parallel between webhooks and Aspect-Oriented Programming, AOP. Before we address that, what is AOP, and can you give an example of when you would use it?

Gordon: Aspect-Oriented Programming (AOP) is a concept developed in the 90s in the Java community. It's about taking common cross-cutting concerns in application code and abstracting them to a simpler annotation-based aspect layer. In the blog post, I gave the example of Spring Security, a popular framework that works with Java Spring Boot, allowing you to set Java annotations to authorize functions. Aspects can also define common logging traits. This approach takes any cross-cutting concern where you would otherwise be re-implementing similar logic across different concerns and separates that concern using an annotation-based system. Kubernetes naturally supports this paradigm because it has a system of annotations built-in when defining deployment or pod specs. There's a field for labels, which you mentioned earlier, and also for annotations. Normally, these annotations don't do anything; they're just text decorating your deployment. You can search for them, but Validating Webhooks and Mutating Webhooks give power to that system. When inspecting the JSON definition of a pod, you can look for certain annotations, and if they're present, take action. This perfectly encapsulates what AOP is all about.

Bart: And what does Aspect-Oriented Programming have to do with Kubernetes Webhooks, Explained?

Gordon: The power behind annotations really comes into play with Webhooks. A particular example I had in my blog post was taking secrets from HashiCorp Vault. We would begin by specifying an annotation on the pod to indicate where to look in Vault. Then, the webhook service would inspect the pod to see if it has this annotation. If so, it would take the specified action and update the pod accordingly. This approach allows for a huge benefit. If we had done it another way, such as embedding a client to speak to HashiCorp Vault directly into the Docker container, the container would become bloated and heavy. By doing it at the Kubernetes infrastructure layer, we achieve a much cleaner separation of concerns.

Bart: In your article, you mentioned, and you said previously in the podcast, that Kubernetes resources are dumb and don't always provide all the information a Validating Webhooks or Mutating Webhooks might need. Can you elaborate on this challenge and how it affects webhook implementation?

Gordon: So, all I meant by that, and as I said earlier, is that declarative languages like HTML are straightforward. The particular objects involved in the Kubernetes ecosystem for webhooks are the Mutating Webhooks or Validating Webhooks, which are simply YAML definitions. The configuration is actually the easiest part of webhooks. This is just defining a webhook service located at a specific URL or a running service in the cluster to run when a pod is updated or created, or whatever is configured. The object deployed with a Helm chart to define the webhook isn't inherently doing anything by itself. It's just telling Kubernetes where to look. The real heavy lifting takes place in the web service that's running. This web service receives a JSON body representing all the pods and potentially mutates them. This web service can be written in any language, such as Go, Python, or any other language. It's an ordinary web service hooked into the ecosystem by means of that configuration object. The configuration object itself isn't really doing anything; it's the service that's doing the actual action, the imperative code.

Bart: Having gone through all this, are there any pitfalls that developers of Webhooks should be aware of?

Gordon: Besides the fact that you need to thoroughly battle test everything because there are many areas where things could go wrong, one thing I discovered is that when writing a Mutating Webhooks that modifies the entry point of a running container, the entry point is not always visible to Kubernetes when launching a pod. This is because the entry point is often defined in the Dockerfile, and you're just using the default entry point. When your webhook service receives the JSON definition of the pod, it will specify the image to run and the environment variables, but it won't know what the entry point is until the pod is launched. To overcome this, your webhook service needs to be smart enough to connect to the container registry, perform a Docker inspect, and determine what the container image does. With this additional API call to the container registry, you can inspect more details of the pod that's trying to launch and make adjustments accordingly.

Bart: I know we talked a bit previously about secrets management, but do webhooks pose any possible security risks?

Gordon: So, this is something I've thought about, and the short answer is yes, but I think it's unlikely. In the Kubernetes community, there's long been talk about an attacker having access to your cluster and spinning up pods to perform crypto mining, burning through your CPU credits and everything. I actually think there's been at least one documented piece of malware, called Kinsing malware, that took advantage of a misconfigured cluster to do exactly that. Webhooks, technically, could pose a similar risk because every pod launched through your cluster first goes through the webhook server. In theory, an attacker could write a webhook, install it, or modify an existing one to replace the web service you're trying to launch with their own crypto mining pod or launch pods directly in some namespace you're not aware of. However, if you're using a webhook that modifies existing pods, I think it raises the risk to the attacker that you'll notice what they're doing. If you're trying to launch a deployment and it doesn't work because they've supplanted it with a crypto mining scheme, you won't just say it doesn't work. You'll investigate why, and then probably immediately discover what's going on.

Bart: In terms of your process learning all this, how did you do it? Through sweat and tears, suffering, begging for help? What was it like?

Gordon: I'm an experiential learner. When I first got started, I installed an instance of Minikube on my MacBook. However, I quickly learned that the memory profile of Minikube was noticeable, slowing down other operations. I ended up purchasing four Raspberry Pis and installed Kubernetes on them. My first attempt, using kubeadm, was a failure. I eventually came around to using K3s, which was much easier to get on board with. I have a home lab with a cluster of four Raspberry Pis in my basement, which I use as needed. In addition to that, I had the privilege of working with talented DevOps engineers who were great mentors and tutors, guiding me when I had detailed questions. It was a lot of trial and error, but also a lot of good mentorship.

Bart: Speaking of trial and error and good mentorship, we read that you recently married your best friend and that you're a cat lover. Can you confirm?

Gordon: I can indeed confirm. I got married in July to my girlfriend of six years. We're having a great time as newlyweds. We currently own seven cats, so it keeps us busy - it's a bit of a zoo.

Bart: Any possibility of getting an eighth cat and you could have K8s cats?

Gordon: I like the pun, but absolutely not. This is too much as it is. I'm not sure if Jackson Galaxy is one of your regular listeners, but if he is and you're out there, Mr. Galaxy, I could really use your advice on some of these cat matters. Seven is enough.

Bart: And what's next for you?

Gordon: I'm actively interviewing with several companies right now. I'm also hoping to have my next technical blog post out in about two weeks. This one won't be on Kubernetes, but rather on Python. I consider myself the world's worst blogger. I'm similar to the Ents from Lord of the Rings - when I say something, it takes me a long time to say it, but I only say things if they're worth taking a long time to say. My previous Kubernetes blog post, which is the subject of this interview, is about 4,000 words. It just takes me a while to write.

Bart: But you get there.

Gordon: That's the important thing.

Bart: If people want to get in touch with you, what's the best way to do it?

Gordon: Good question. I'm on most social media. You can find me on Threads, where I follow a lot of developer folks. Follow me on GitHub. My GitHub may not look particularly active, but that's mainly because I have a lot of private repositories. I actually have a project I'm working on, a secret project that I'm hoping to make public in the next six months. So, follow me on Threads or GitHub.

Bart: All right, plenty of action going on there. Thank you very much for putting the blog out as well. We really enjoyed it, and we know our audience will too. I wish you the best of luck in the following steps, with your marriage and cats alike. I hope our paths cross somewhere in the future. Take care.

Gordon: This was a delight. Thank you very much.

Bart: Cheers.

Listen anywhere