Infrastructure as code: beyond Kubernetes manifests

Guest:

Eron Wright

Eron Wright, software engineer at Pulumi, discusses infrastructure automation and Kubernetes resource management:

How Infrastructure as Code and policy enforcement work together to automate deployments while maintaining compliance and security.
Why combining Helm charts with cloud-native tools is essential for real-world deployments.
The evolution of Kubernetes automation through tools like the Pulumi Kubernetes Operator, enabling scalable multi-cluster deployments and GitOps workflows.

Relevant links

Transcription

Bart: Welcome to KubeFM. Can you tell me a little bit about who you are and what your role is?

Eron: Hi, and thanks for having me on the show. My name is Eron Wright. I am a software engineer at Pulumi, where my focus is on Kubernetes integration. Pulumi is a vast product that integrates with lots of different technologies and services. My goal is to create the best possible experience for Kubernetes users. I joined Pulumi a year ago from a cloud startup called StreamNative, which provides Apache Pulsar as a service across different clouds. We struggled with deploying those workloads across multiple clouds, but Kubernetes provided a reasonably portable foundation for building cloud services. We based our service on Kubernetes, using EKS, GKE, or AKS as necessary to deliver the workload. Pulumi was always a part of that, and when I was looking for another opportunity, my first stop was Pulumi. It's been a great time since then.

Bart: Fascinating. Now, tell me about three emerging Kubernetes tools that you're keeping an eye on.

Eron: I am pretty focused on Pulumi and how it works great for Kubernetes workloads. This means working with Kubernetes tools, with Helm being top of mind as the de facto package manager for Kubernetes. Watching what's going on with Helm and anticipating Helm v4 is really interesting, so that Pulumi can maintain its ability to deploy Helm charts well.

Kubernetes itself is also a key area of focus, as it continues to evolve and add features that round out the type of applications that can be run in a Kubernetes environment. For example, I'm personally interested in the recent support for image volumes, which allows attaching secondary Docker images to a pod to use them as data sources or content. This is really exciting.

Another area of interest is server-side apply, a strategy for supporting multi-party authoring of objects in Kubernetes. The Kubernetes API is sophisticated, making it possible for different parties to work together or collaborate on the objects that constitute a workload. For instance, a tool like Pulumi might manage the deployment of a particular set of pods, while an autoscaler handles the replicas, scaling as needed. This means multiple tools are editing the fields of a deployment, and making all these tools work well together is what server-side apply seeks to do. However, it's harder to integrate into tools than it seems, so this is an ongoing area of investment.

Lastly, I'm interested in OCI, as OCI registries have become a great source for Docker images, Helm charts, and more. Supporting authentication and other features is an ongoing effort.

Bart: Now, some of the questions we're going to look at next are ones that have come up as a result of different podcasts that we've done, different topics that our guests have mentioned. When it comes to the topic of automation and resource management, one of our guests, Alexander, expressed that having an automated mechanism is better than enforcing process. What automation tools or approaches do you recommend for managing Kubernetes resources?

Eron: I would definitely recommend Pulumi. Pulumi's Infrastructure as Code (IAC) is all about automation, specifically automating the deployment of workloads using your preferred programming language. It works by declaring the resources you need in your language of choice, such as TypeScript or Python, using a declarative approach. Pulumi then maintains state to ensure that what has been deployed to the cloud aligns with your desired configuration.

I think automation is very practical with a tool that allows you to work in a declarative way, and Pulumi is best in class in that regard. Regarding processes, some of that has to do with policy enforcement. Pulumi shines in this area with its policy-as-code subsystem, called CrossGuard. This lets you set up guardrails to enforce compliance, allowing developers to provision their own infrastructure within an organization while sticking to best practices and security compliance.

Bart: In terms of GitOps and platform engineering, another one of our guests, Hans, argues that GitOps is an excellent building block for building platforms with great developers. He mentioned the ability to merge, review, and discuss code changes and PRs, and the additional benefit of not granting permissions. Should all platforms use GitOps?

Eron: And what's your experience? When I think about deploying software to the cloud, I think of many environments I'm going to deploy that software to, and I want consistency across that. I'm obviously going to use automation there. If I'm using a tool like Pulumi, I can use ordinary code to really articulate what I need. How Pulumi works with GitOps happens in at least two ways. There are two levels. One is the program itself. The source code of your program might say, "I need a pod, I need a service account, I need an IAM role." These are all parts of my program. All of that is under version control, and we're using pull requests to review the infrastructure code before it's deployed. We also get to review the configuration. You'll see there's a configuration change to the staging environment or a configuration change plan for the production environment. Both the code and the configuration are in Git, and you get all that beautiful workflow from systems like GitHub. If you're using that in combination with Pulumi Cloud, it's even better because of the really deep integration with GitHub, and other providers as well. When you open those PRs, Pulumi will come in and annotate the PR with the stack resources that are going to be changed. It actually embeds previews into the PRs. You get this really visual experience about working with code, but you get this really built-up experience: what is going to change if you merge this PR?

The second level is automating the stacks themselves. What a stack is in Pulumi parlance is an instance of a Pulumi program that's been run in a particular environment. My staging deployment and my production deployment constitute two stacks that were based on the same program. The need to deploy those stacks, like deploying the staging environment in a certain way, is something that you can use some of the higher-level technologies we have around there. We have a brand new Pulumi Kubernetes Operator that has been developed to do Pulumi deployments using your Kubernetes cluster. Pulumi is a tool where you run that tool to effect these deployments. We have this brand new operator so that you can run Pulumi inside your Kubernetes cluster. It can act as that control system for going out and doing those deployments, maybe to other clusters or maybe to the same cluster. All of that operator experience is also GitHub-centric. For example, with the Kubernetes operator that we provide, it can be told to monitor a Git repository that contains Pulumi source code. Whenever that source code changes, it will kick off an update to the stack and deploy those changes into your cloud. Pulumi is very much built for GitOps, and you can combine those things with other Kubernetes technologies like Flux, like Argo CD, which can then be used to drive, top to bottom, maybe Argo might say, "We want to get that Pulumi stack deployed into staging," so it'll create a stack object. And then that stack object is consumed by our operator, and that will run Pulumi. Pulumi will then figure out what resources need to be created or updated. It's kind of all turtles all the way; a totally declarative system is the end result.

Bart: Now, you mentioned Helm earlier, and one of our guests, Jacco, said that he disliked Helm's approach to templating, mentioning difficulties with multi-line strings and loss of strict schema enforcement, but he did acknowledge the usefulness of Helm packages. Do you see more tools and companies tackling the Helm package manager with new and innovative solutions? Are you happy with the status quo? How should we install third-party packages into Kubernetes?

Eron: I mean, it's a great question. Over the almost 10 years I've been working in the Kubernetes space, I have tried every tool that has existed at one point or another, and I just found Pulumi to be the general solution I was looking for. Helm is actually still very much a part of that. The chart ecosystem is vast, and often those charts are authored by the component authors themselves or at least can be thought of as semi-official charts in many cases. They tend to work well and track the underlying component and changes that might happen in the component. So, it's good to use those charts. I would not hesitate to recommend using a Helm chart where it's available, especially if it's an official one. However, Helm is insufficient in the real world.

Let me give you a really easy example. Everybody needs to install cert-manager in their Kubernetes cluster. cert-manager is a component that provisions TLS certificates from Let's Encrypt. It's a great component, and everybody loves it. How do you install cert-manager? The Helm chart would be the right start, and a Pulumi program can very easily install a Helm chart for you using Helm resources. However, installing cert-manager is not just about applying the chart; you also need some cloud resources. For example, you need an IAM role, and if you're installing to EKS, you will create an IAM role for cert-manager, give it permissions to write to Route 53, and then link that IAM role to the Kubernetes service account for cert-manager. This way, when cert-manager provisions a certificate, it can make Route 53 DNS records, which is part of the protocol that powers Let's Encrypt to provision a certificate.

Deploying a given workload into Kubernetes often involves not just pure Kubernetes resources but also a combination of cloud resources and Kubernetes resources, like that IAM role and those bindings. That's why Pulumi is so helpful, as it's able to reason about that broader ecosystem of resources. In a compact program written in your favorite language, like TypeScript, you can write components that encapsulate this stuff at large. For instance, here's that cert-manager component: it consists of an IAM role, takes the IAM role's ARN, puts it into one of the Helm values, and deploys that cert-manager chart. Boom, you have a working cert-manager end-to-end.

Do I love Helm? Yes. Am I excited about Helm v4 when it comes out and what it contains? I'm sure I will be. Will I recommend users continue to use the vast ecosystem of charts? Absolutely. But do I recommend they just use Helm on the command line or even something like Argo CD? No, I would recommend people consider using a tool like Pulumi in the middle so that they can actually build that whole resource graph encompassing both those Kubernetes and cloud objects in a very nice way.

Bart: Kubernetes turned 10 this year. What do you expect in the next 10 years to come?

Eron: I don't have a lot of insight into that. It's probably entering a stage of maturity. Over the last five years, Kubernetes has become readily able to run stateful workloads. The move towards Serverless is important, as it allows for more elastic environments, scaling down to zero, and combining the power of stateful workloads in Kubernetes with stateless scalability, which is a really exciting direction. Multi-cluster or cluster federation is also important. For example, in Google Cloud, we see fleets of Kubernetes clusters working together because Kubernetes clusters are regional objects. When you go beyond that to deliver a global service, you have to stitch those things together, which is another exciting area.

Bart: What about you, Eron? What's next for you?

Eron: Thanks for asking. I've been focused on delivering a new version of the Pulumi Kubernetes Operator. We just made our public beta available for people to try and give feedback. Over the next few weeks, we'll be bringing that to a 2.0 GA release. What I'm focused on right now is really listening to customers. I was pleased to have the opportunity at Pulumi to do a major architectural revamp of the operator, which is a technology we developed a few years ago. This new design is more scalable and isolated than ever before. Our Pulumi Kubernetes Operator now runs Pulumi commands inside separate pods, properly isolated with service accounts, making it ready for production use cases involving high-scale clusters and multi-tenancy. We've already seen it solve a lot of problems, and people have been waiting for this for a long time. Now it's time for us to listen again and see what our next moves should be.

To give you a couple of examples, we want to integrate better with Pulumi Cloud. For instance, Pulumi Cloud, which is one of the ways you can store the state of a Pulumi program, now supports OIDC login. You can use the security tokens that Kubernetes clusters automatically generate for pods as an identity when reaching out to a service like Pulumi Cloud. We're excited to enable our operator to deploy Pulumi programs without the need to configure credentials or tokens, providing a great out-of-the-box experience. Now that we have this new architecture in place, there are lots of things on our roadmap. What's next for me is doubling down on this operator investment and doing right by our customers.

Bart: How can people get in touch with you?

Eron: Thanks for asking. You can catch me in a few different ways. My email address is [email protected], and I'm happy to take emails. You can also catch me on the Pulumi Community Slack channel, in our workspace, where there's a Kubernetes channel. If anyone wants to discuss how Pulumi and Kubernetes work together, reach out to us on the Pulumi Community Slack - we'll be happy to talk to you. Thanks a lot, Eron. We'll speak soon.

Podcast episodes mentioned in this interview

Why Helm's design is flawed
with Jacco Taal
Platform engineering: learning from the Kubernetes API
with Sven Hans Knecht
Configuring requests & limits with the HPA at scale
with Alexandre Souza