Declarative configuration and the Kubernetes Resource Model

Oct 15, 2024

Host:

Bart Farrell

Guest:

Brian Grant

This episode is sponsored by StormForge – Double your Kubernetes resource utilization and unburden developers from sizing complexity with the first HPA-compatible vertical pod rightsizing solution. Try it for free.

This episode offers a rare glimpse into the design decisions that shaped the world's most popular container orchestration platform.

Brian Grant, CTO of ConfigHub and former tech lead on Google's Borg team discusses the Kubernetes Resource Model (KRM) and its profound impact on the Kubernetes ecosystem.

He explains how KRM's resource-centric API patterns enable Kubernetes' flexibility and extensibility and influence the entire cloud native landscape.

You will learn:

How the Kubernetes API evolved from inconsistency to a uniform structure, enabling support for thousands of resource types.
Why Kubernetes' self-describing resources and Server-side Apply simplify client implementations and configuration management.
The evolution of Kubernetes configuration tools like Helm, Kustomize, and GitOps solutions.
Current trends and future directions in Kubernetes configuration, including potential AI-driven enhancements.

Relevant links

Transcription

Bart: It's not often that I get to speak to someone who worked with Kubernetes before it was even called Kubernetes. In this episode of KubeFM, I get a chance to speak to Brian Grant, the CTO of the upcoming startup, ConfigHub. Brian will share his knowledge about the Kubernetes Resource Model (KRM), the architectural foundation behind Kubernetes' flexibility and extensibility. In today's conversation, we explore how KRM's resource-centric API patterns enable Kubernetes to manage infrastructure, reconcile desired and observed states, and maintain interoperability across hundreds of tools and thousands of resource types. We'll unpack Kubernetes' evolution from its early API inconsistencies, the powerful spec and status substructures, and how tools like Helm, Kustomize, and emerging GitOps solutions like Flux and Argo CD build on these foundations. You'll get to understand more about why KRM is essential for operators and developers extending Kubernetes and learn how this model shaped the entire cloud native landscape. Stay with us as we explore the future of Kubernetes, including AI-driven enhancements and the ever-growing ecosystem of CNCF projects. This episode is sponsored by StormForge Optimize Live. StormForge Optimize Live continuously rightsizes Kubernetes workloads to ensure applications are cost-effective and performant while removing developer toil. As a vertical rightsizing solution, Optimize Live is autonomous, tunable, and works seamlessly with the Kubernetes HPA at scale. Free developers from the toil and complexity of setting CPU and memory requests with Optimize Live's fully configurable machine learning-based recommendations and intelligent automation. Start a free trial or play around in the sandbox environment with no form fill required. Now, let's get to the episode. Brian, can you tell me a little bit more about what you do and who you work for?

Brian: Hi, I am CTO at a startup called ConfigHub that I co-founded with Alexis Richardson, a former CEO of Weaveworks, and Jesper Joergensen, who is a product lead at Heroku. We're just getting started on developing an alternative to infrastructure as code.

Bart: Which three emerging Kubernetes tools are you keeping an eye on?

Brian: One is Dagger, mostly because Solomon is working on it. It's kind of interesting. BotKube is another one. I was a fan of chat ops and interacting with team members and systems through an interactive interface, even before AI. Now they've added AI to it. Then there's Glasskube. I'm surprised it's taken so long for a tool like Glasskube to emerge. We had an issue from early on: managing what we called add-ons on Kubernetes clusters. It's similar to package managers on Linux systems, where you specify the packages you want installed, and they install dependencies and figure out the versions automatically. Helm doesn't exactly do that; it doesn't have the same model. Glasskube seems to be a little closer. It's designed to manage components that you want as fixtures installed in your cluster. This seems to have been a gap for quite a long time.

Bart: And how did you

Brian: a long time

Bart: And how did you get into Cloud Native?

Brian: I joined Google in 2007 and, after about a year and a half, I transitioned to the Borg team, which was their internal container platform. I was a tech lead on the control plane side of the Borg team and started an R&D project to redesign Borg because it was being used in ways it wasn't really designed for. Once cloud became a priority, the Kubernetes project kicked off in about 2013. I was involved from the beginning, even before we decided to open source it, before it was called Kubernetes.

Bart: And I believe also in the initial phase of the CNCF, you had a role there too, what was that about?

Brian: We started thinking about creating a foundation to host the Kubernetes projects in February or March of 2015. We had an early discussion with Brian Stevens about the idea of doing it with the Linux Foundation, and discussions progressed from there. Once the foundation started to get going and the structure began to take shape, it was decided that there needed to be a body of technical people to decide what products to add to the foundation beyond just Kubernetes. We envisioned that there needed to be products complementary to Kubernetes to flesh out more of what people would actually need to manage applications. Those things needed a home, and one didn't really exist - Apache Software Foundation at the time was very heavily centered around the Hadoop ecosystem.

When the technical oversight committee got formed, Craig McLuckie put in an application for me, and I was elected to the board. I think I sponsored more projects than anyone else in the inaugural committee. I was even involved in ones like Prometheus, which was created by some ex-Googlers. There were some issues around that, which I helped sort through.

I think it was a good experience to figure out what we needed to flesh out the ecosystem of things you would need to run a system like Kubernetes. The synergy between Prometheus, which was the second project we brought into the foundation, and Kubernetes worked out really well. We used Prometheus to instrument and monitor Kubernetes in the ramp-up to 1.0, which made other people who were using Kubernetes aware of Prometheus and made it a natural choice. I also worked on some of the principles of the foundation, such as how we would decide what projects to bring in, the idea of the sandbox, so we could bring in more nascent projects. I co-authored the new definition of what cloud-native was for the foundation, based on my experience from Google's internal ecosystem and what was required in that sort of dynamic infrastructure environment.

Bart: And very few people started their career in cloud native. What were you before cloud native?

Brian: Before cloud native, I was mostly a systems kind of person. I worked on operating systems, networking, supercomputing, compilers for many years. I worked on three different dynamic compilers and high-performance computing on GPUs, way too early.

Bart: You've seen the Kubernetes ecosystem since its infancy, and it moves very quickly. How do you stay updated with all the changes that are going on? What resources have worked best for you when it comes to staying on top of things - blogs, podcasts, what works best?

Brian: Blogs and podcasts both work pretty well for me. I have Google alerts set up for certain keywords, so I get notified every day of new content. Now that I'm posting to Medium, I also glance through Medium articles. What I really like about blog posts is that they're often concrete, providing an understanding of how people use these systems and proceed with them, as well as their understanding of the concepts and mechanisms. As a person who's more of a builder rather than a user, it's helpful for me to see the user perspective. Podcasts are also useful, both audio and video, because I can listen to them while doing other things, which helps with time utilization.

Bart: For that sort of thing, so I find both of those pretty useful. If you could go back in time and share one career tip with your younger self, what would it be?

Brian: I think it takes time to get to know yourself. Take time to pay attention to that. I think that can help you better understand what you like, what you don't like, what your bad habits are, what you're good at, and what you're not good at. If you wait until life events make you realize that, sometimes it can be too late.

Bart: Definitely, I agree. Never too early to start. In terms of our topic for today, as part of our monthly content discovery and learning Kubernetes, we found this series that you've been writing, "Infrastructure as Code and Declarative Configuration." From it, we selected a couple of different articles for today's episode. To start out, can you explain the Kubernetes Resource Model (KRM), or GRM?

Brian: The Kubernetes Resource Model (KRM) describes an architectural pattern of controllers reconciling the observed state and the desired state, and reporting that back to the API. Almost everything in Kubernetes works this way. Even the kubelet is effectively a controller that watches relevant resources in the Kubernetes API, takes action on them, and reports back the status. This pattern turned out to be very extensible and robust. The model also describes the standardized structure and behavior of the Kubernetes API operations on resources, including mechanisms like resource types, names, labels, and spec and status. For the controller model, the details of the API are described in the API convention document, but the Kubernetes Resource Model (KRM) describes it at a higher level, outlining the purpose of the model and how it was intended to be utilized in the system.

Bart: And why did you coin a new term for Kubernetes Resource Model (KRM)?

Brian: At the time, there were many people and teams within Google building or trying to build new controllers for Kubernetes. Custom Resource Definitions (CRDs) were relatively new, and people were exploring what they could do with them and trying to use that extension mechanism. We had been trying to explain the Kubernetes Resource Model (KRM) to contributors individually, one-on-one, and there was the original operator document, but the principles of the system were not well documented. We wanted to make it easier to explain and avoid pitfalls so that operators and their CRDs would work as intended with client-side tooling like kubectl and other declarative tools like Helm. We were starting the kubebuilder project around the same time to provide a framework for building operators, but there seemed to be a need to explain the model in more detail so that teams could avoid common mistakes and understand how to approach it.

The term used at the time was "Kubernetes-style API," which focused attention on operations rather than resources. This led people astray, especially Google teams, since Google's APIs were built with RPC. The operation-centric model was mostly what people understood and how they approached the problem. However, this wasn't going to enable them to build operators in the right way. Operators purely interact through resources stored in the system. There is a place for a more operation-style API, such as OpenAI's chat completion API, where you send a request with parameters and receive a result in a stateless manner. But for the Kubernetes model, it's all centered around objects, the resources, and controllers don't interact directly through other types of APIs.

We needed to establish the right mental model and explain details of the mechanism, such as what happens when a resource creation request is received. It goes through a sequence of specific stages like authorization, admission control, defaulting, and version converting. At the time, this wasn't documented anywhere, and you had to read through the API server code to understand it.

Bart: Now it sounds like Kubernetes Resource Model (KRM) has evolved significantly. I've heard there was a major overhaul of the Kubernetes API before version 1.0. Can you tell us more about this overhaul and what motivated it?

Brian: When we open-sourced Kubernetes, the API was effectively just the API from the prototype, and it was really an out-of-quality API. It wasn't consistent at all. The only pod had a current state field, and it was structurally identical to the desired state field, with just different parts of the structure populated in the desired state and the current state. This was really confusing. The weird definition of desired state versus observed state wasn't really present across the other resource types. Originally, I think there were just three types: pod, replication controller, and service. There wasn't even a node type originally. I knew I wanted to support bulk operations in the CLI, and I knew we wanted to have a declarative model based on the serialization of the resources, so we needed to have a clear structure. We were also on this ramp towards 1.0, and we knew we wanted to have a stable API, so we wanted to make any significant, non-backward-compatible changes before 1.0. Additionally, as part of that, we were just cleaning up the architecture. The architecture we had when we open-sourced Kubernetes wasn't clear, because it was just kind of whatever code we had from the prototype. There were things like the API server calling out to the kubelet down to the node to get state about the pods and nodes synchronously when you made an API call to the API server. We needed to implement a way for the controllers to report status back so that that interaction could be asynchronous. So, all these factors led us to look at what the API really needed to be, both for the clients and for the controllers, which were a different kind of client. Clayton Coleman took up the task of pulling together a set of issues that was accumulating and starting a proposal for the API cleanup. Clayton, Tim Hockin, and I iterated on that, and it turned out really well. We made very few changes between v1 beta 3 and v1. The v1 beta 2 was actually created as just a copy of v1 beta 1, just to test the versioning mechanism itself, because we also added API versioning as part of the changeover from v1 beta 1 to v1 beta 3. Daniel Smith built that.

Bart: It sounds like that was quite a significant change. How does this consistency in the Kubernetes Resource Model (KRM) specifically benefit developers and tools in the Kubernetes ecosystem?

Brian: I think there are a couple of different dimensions to that. One is implementing an API client and interacting with one resource, which is pretty straightforward. We have something called the dynamic client, simple.go, which is about a page of code per operation - create, update, get, and delete. It's pretty understandable and simple, as it's a simple HTTP API. This made it easier to implement clients in other languages besides Go.

Early on, I wanted a mechanical description of the API to facilitate this. Initially, I hand-wrote a description of the API in RAML. Then, Swagger became more popular, and I found a REST API framework, GoRestful, that would generate Swagger API definitions automatically. So, I integrated that into the API server to avoid writing the description by hand. Eventually, we used Swagger to generate clients in other languages like Python.

Individual operations for an individual resource are simple, but the real power comes when you want to operate on multiple resource types. Even among the built-in resource types in Kubernetes, there are probably a few dozen. However, with Custom Resource Definitions (CRDs), there are more than 10,000 unique types in the ecosystem. This diversity of types would be impossible for clients to support if they weren't completely uniform.

The Kubernetes API is unique in that it has a completely uniform structure across different resource types. You can use the simple dynamic client for arbitrary resource types. If you want to integrate with multiple types, the Kubernetes API makes it constant work instead of linear work in the number of resource types. Some clients, like controllers, interact with a very small number of types - the kubelet interacts with just pod and node types, and maybe a few others like config maps and secrets.

However, kubectl interacts with arbitrary types, including built-in types and CRDs. It would be impossible to build that sort of client without this type of uniformity.

Bart: I can see that consistency here would probably be beneficial. The article also mentions that Kubernetes doesn't require fat client libraries like Terraform providers, especially with Server-side Apply. Can you explain the difference and what advantages it brings?

Brian: It's very much related to the issue I was just discussing. Terraform has a plugin framework for the so-called Terraform providers, and that plugin framework enables you to implement on the client side those standard operations: create, update, delete, and get. Every resource type has different operations that may have different REST paths or require different parameters or have different endpoint structures. There are all kinds of different characteristics that can be different. If those things are different, then you end up having to write that by hand, and you lose that ability to easily write one of these generic clients. You have to hand-integrate these operations that create, update, delete, and get for each individual resource type. So, if you have 500 or 1000 resource types, like in the case of the cloud providers, or 10,000 resource types in the case of Kubernetes, that would just be an immense amount of work. It would be decades of work. Having a thin client that you can work against these kind of uniform resource APIs drastically reduces the amount of work. Not by a factor of two or a factor of ten, but it's constant time instead of linear time. So, it can make it feasible to implement a client in a day or a week, depending on how complicated of a client you're building, as opposed to years.

As a result of the complexity of the cloud APIs and the inconsistency in the cloud APIs, integrating new infrastructure as code tools with those APIs is an almost insurmountable task. That's why you see new tools like Pulumi support using the existing Terraform providers that have already been written. And that's effectively a de facto standard cloud library for these cloud APIs.

Server-side Apply is what does the diff of the previous desired state and the new desired state and constructs a patch to send to the API server effectively. Doing the diff and constructing the patch requires knowledge of the schema of the resource type. There are structures within the Kubernetes APIs, and not just Kubernetes APIs, other APIs as well. Things like what I call associative lists, where the list elements aren't strictly ordered, but you need to match them up based on properties within node list elements or unions, things like that. Those things are not expressible in JSON directly. And they're not even necessarily all currently expressible using OpenAPI. So, client-side apply originally had, and still does, most likely have hard-coded kind of knowledge of those structures and how it should merge them. Different structures of the resources created this client-side code. So, we lost that complete uniformity of just the simple CRUD operations. But Server-side Apply took that knowledge and moved it to the server-side. So, that could make your client simple again, and you could basically just do apply and let the server figure out the differences. And that unblocked clients not only in other languages, like Spinnaker, which is written in Java, and that uses Server-side Apply, but it also simplified integration. Integrating the kube control code to do that was pretty cumbersome. Even for a client written in Go, like the HashiCorp Terraform plugins, it is pretty difficult. So, Terraform ended up using Server-side Apply and its Kubernetes provider, the generic manifest resource type in Terraform, and that's what enabled Custom Resource Definitions (CRDs) to be applied with Terraform, for example.

Bart: You mentioned that Kubernetes resources are self-describing at rest. How does this feature complement the Server-side Apply approach, and what advantages does it offer?

Brian: Kubernetes uses the wire format of the API resources for all serialization cases. Originally, it used this format for storing resources in etcd and in kubectl. For example, when you run kubectl create -f and specify a deployment.yaml file, or when you run kubectl apply on a whole directory of YAML files containing API resources, you need enough information to construct the API calls. This includes parameters like API version, type, resource name, and namespace.

We decided to make the resource representation include all this information, even though it seems redundant with the REST type. This way, if you serialize resources to disk for etcd or disaster recovery, you can read the resource and understand what API endpoint to call to operate on that resource. This approach is convenient and powerful, as it means users of declarative tools or client implementers don't have to learn two different concepts. It also eliminates the need to design an envelope structure to wrap the resource and include additional information.

Many other APIs don't have this property, so if you serialize resources in an inventory system, it needs to have an envelope that specifies additional information. For example, Terraform glues together multiple pieces of information in plugins, including provider information and resource attributes, to figure out how to call the APIs. This approach is very ad hoc. We wanted a consistent mechanism to perform bulk operations based on serialized representations of resources. This approach also makes all operations symmetric, which simplifies clients. For instance, it's not the case that get and put have different structures or resources. It's entirely consistent.

With some APIs, the create body is different from the get body, which is a nuisance if you want to do a get, diff, and update. Updates are very inconsistent across many APIs. Having something like Server-side Apply, where you can do a consistent operation and the server figures out how to perform the update, drastically simplifies things for clients.

Bart: It seems like this self-describing nature has had a significant impact on the Kubernetes ecosystem, and how has it influenced the development of configuration management tools?

Brian: From very early on, in PR 1007, I described a model for how tools could be built on top of the serialized resource representation. This was before pretty much any tools had been implemented, before Helm v1 and v1 data 3 API. This helped us understand where we wanted to go with the API and v1 data 3 API overall, but it also let the ecosystem know how they should think about building these tools. Very soon, we saw a pretty big diversity of tools. People could build tools using whatever languages or mechanisms they were familiar with. They could use Mustache or Jinja to replace values in resources. They could use domain-specific languages like JSON, Starlark, or Dhall. They could use existing tools they were already familiar with, like Ansible. We saw people using general-purpose languages to generate resources or templating systems from their favorite language. Helm used Go templates, but others used ERB and built tools that way. This lowered the barrier for people to get started with tools they were already familiar with as they were migrating to Kubernetes. It created a clean separation between the templating or resource generation step and the apply step, making it easier to build these tools. In contrast, Terraform had a heavyweight framework with client-side plugins and a de-orchestrator.

Bart: Now, you've been observing and navigating the space for quite a while now. You originally wrote about this in 2017, but how have Kubernetes configuration tools evolved since then?

Brian: In 2017, when I wrote that original article, I was also coming up with the idea behind Kustomize. Kustomize was created after I wrote that, and it eventually became the second most popular tool after Helm. Helm matured quite a bit, developing v3, which addressed a lot of common Helm user pain points. The chart ecosystem really exploded. We worked together with the Helm community to create the community chart repo, a deliberate effort to provide an easy way to get packages of applications that can be installed and run. We looked at other systems like Terraform, Chef Supermarket, Puppet Forge, Ansible Galaxy, DCOS Universe, and Docker Hub, which all had easy ways to get packages of applications.

In 2017, there were quite a number of users of domain-specific languages like JSON and Starlark. Over time, new languages have been developed, such as Q, Nickel, and KCL. However, Starlark is not as widely used anymore, possibly because it is a Python dialect, and there are now more tools that use full general-purpose languages like Python, such as CDK8s and Pulumi. Data on the popularity of these tools is hard to come by, but there seems to be interest in them.

The entire GitOps category of tools emerged shortly after I wrote that article. Weaveworks launched Flux, one of the earliest tools, and kube-applier from Box was launched around the same time. A year later, Argo CD was open-sourced. These tools are at a higher level and do not define configuration formats themselves; they are controllers that deploy configuration using existing tools or formats like Helm or Kustomize. Since Server-side Apply was created, Terraform and Pulumi have leveraged it to integrate into those infrastructure code tools. This allows users to manage resources inside Kubernetes using the same tools they use for provisioning their infrastructure. CDK8s is also in this category, as an extension of the AWS CDK, allowing users to manage Kubernetes resources using the same patterns as their AWS resources.

The Compose tool is also in this category, helping people from the adjacent Docker Compose ecosystem generate resources to deploy to Kubernetes. We created this tool to facilitate using Docker Compose in development and moving to Kubernetes for production or migrating to Kubernetes from Docker.

There has been quite a bit of consolidation, with many tools people were experimenting with no longer being maintained, at least not as open source. The other tools have matured enough that people do not feel the need to build their own as much. However, new tools like Timoni, which uses CUE as the configuration format, are still emerging. The subject of YAML and Helm has also come up.

Bart: A few times in this conversation, we've had several guests on the podcast express their frustrations with configuration tools in YAML. One of them, Jacco, compared Helm to PHP. Another guest, Alexander, suggested that templating Helm as a string isn't necessarily the best idea. To put it mildly, he expressed it in a different way. Are we reaching a point where the solutions that worked in, let's say, 2017 are not necessarily working right now?

Brian: I think nothing has really changed regarding the weaknesses of YAML and Go templates on top of YAML; these issues existed from the beginning. YAML seems to be somewhat more approachable than some of the other domain-specific languages for most users. For example, some of the other options may be less error-prone if YAML can be described or written in a different way. However, it still seems to be a matter of preference. Helm V2 existed before almost all the other alternative tools were created, and yet it still continues to be the most popular tool as far as I can tell. There have always been people who preferred a different approach, whether it was a different templating mechanism, a domain-specific language, or a general-purpose language. But, for the most part, Helm works. For what people do with it, or the off-the-shelf Helm charts, where they are parameterizing pretty much every attribute, I don't think it would be a lot prettier in any other language. So, I don't know how much difference that makes.

Bart: Looking ahead, do you see room for new tools or approaches in the Kubernetes configuration space, and what might drive adoption of new solutions?

Brian: I think it depends on which part of the chart. The Helm chart ecosystem is very sticky, creating a high bar for tools serving a similar purpose. Pretty much all the alternatives to Helm, like JSON and Starlark-based tools, CUE, Nickel, KCL, and CDK, which are employed for general-purpose languages, focus on user applications instead of off-the-shelf components like cert-manager and Prometheus Operator. This is because those off-the-shelf components have Helm charts, and getting the producer of that software to include some other format besides plain Kubernetes YAML is an uphill battle. Most people use Helm to install those off-the-shelf components, even if they use another tool to deploy their own applications. So, I think that's where experimentation and development continue.

After Helm and Kustomize, the most popular tools are from adjacent ecosystems like CDK8s and Docker Compose. It's hard to get good numbers on tools like Terraform and Pulumi, but presumably, there are some users of those tools that use them on Kubernetes. Something interesting we're starting to see now is LLMs doing configuration generation. LLMs are trained on public data, and Kubernetes has a lot of public YAML. I haven't tried it with Helm charts, so I have no idea how well or poorly that works. However, for regular Kubernetes YAML, it seems to be pretty decent in any of the models, just because there's so much content available. New tools will be interesting to see how the models can be utilized for some sort of new syntax when they don't have a lot of examples. I think that's a new area of exploration.

Bart: To wrap up, are there any specific areas you think that Kubernetes Resource Model (KRM) itself could be enhanced or improved?

Brian: I think the biggest omission in Kubernetes Resource Model (KRM) is the lack of a standard universal status indicator. We originally had an operation API, inspired by Google Cloud's operations API, where you perform an operation and, if it's asynchronous, you get a handle to query the status. There were a couple of reasons we eliminated that. Some were mechanical, like creating a list of operations for every resource type in a standard way, which was cumbersome. Also, the way Kubernetes works, it persists the resource immediately in etcd and then the controllers operate on it asynchronously. So, if we declared operation success at the time the resource was persisted, there wasn't much point to it. You get back the HTTP response, and it tells you at that level whether it succeeded or failed. However, since the controllers are convergent, it was tricky to define. We did eventually define a pattern on the client side, which we called "case status," because many deployment tools were trying to figure this out. For instance, if they updated a deployment, was it successful or not? Of course, it can get tricky if you increase the number of replicas and get some more replicas but not as many as you asked for. Do you consider that success or not? Or, if you deploy a new container image and it doesn't crash, the image doesn't go into image pullback off or crash loop back off, is that enough for success? Or do you need some indication from your monitoring system that it's behaving correctly? We punted on that, and by the time we created Custom Resource Definitions (CRDs), we didn't have a standard. So, that was unfortunate because you can apply a directory of arbitrary resource types, dozens of different CRDs, and it will just work, but figuring out whether it worked or not is tricky. I do think that's an area deserving of more work. Something could be standardized, especially for built-in types, even at this point. Putting forth a recommendation for people building operators would be useful for the whole ecosystem.

Bart: Now, I think you've seen an awful lot of Kubernetes in the last few years, even before Kubernetes was a thing, you were doing it. I noticed that you shared your thoughts on that at the Kubernetes anniversary in Mountain View. So I won't ask you to repeat that. We'll just put the link in the description if folks want to check that out. Instead, we'd like to know more about the story behind the people that made a difference to the project during those years. Is there anyone in particular that you'd like to give a shout out to?

Brian: I'd like to give a shout out to the folks from Red Hat OpenShift who worked with us pretty early on. I think Clayton Coleman sent PRs to Kubernetes within about a week after our announcement at DockerCon. Clayton, Derek Carr, and Jordan Liggitt, who were all at Red Hat at the time, made a pretty big impact on Kubernetes, especially with features for large organizations like Custom Resource Definitions (CRDs) and higher-level application concepts like Deployment. The perspective they brought from OpenShift was really valuable to the project, and having some non-Google participants in the project gave a lot of credibility and helped us build the Kubernetes Community. So that was super impactful. They weren't able to travel for the event in the Bay Area, so I'd like to give them a shout out.

Bart: On a similar subject, how do you see the Kubernetes Community evolving in the future, and what do you think is holding it back?

Brian: The project is like 11 years old now, so the software is really large and complex. This makes it harder for new contributors to understand and get started. Also, there's not as much active development, nor is there much capacity for engaging new contributors and reviewing complex change proposals and PRs. This can be quite challenging, especially with the progression from writing a Custom Resource Definitions (CRDs) and getting it approved, then going through alpha, beta, and GA. It can take a year or more to actually get a new major feature through. I haven't been contributing recently, so I don't know how that is going, but it has become more and more difficult over the years. Now that there is a major new category of workloads, AI workloads, LLM training and inference, that is really important. Maybe that will catalyze the Kubernetes Community to focus on making sure that Kubernetes can run those critical workloads, rather than get replaced by some other new system that is more greenfield and easier to tailor for that scenario.

Bart: And what about you? What sticks for you?

Brian: For me, I'm focused on a new venture where we're developing a new paradigm for configuration of applications and infrastructure, including Kubernetes, but not just Kubernetes. Most Kubernetes users, especially in the cloud, don't just have Kubernetes resources; they also have cloud resources. Infrastructure as code, and Terraform, which is the same age as Kubernetes - about 10 years old - but the concept of infrastructure as code is much older, roughly 30 years old. The current paradigm has pretty much reached the point of diminishing returns. I think we need to revisit the fundamental assumptions and do something differently if we want a different outcome.

Bart: And how can people get in touch with you?

Brian: I am bgrants-0607 on LinkedIn, Twitter, Medium, and GitHub. I'm also on the Kubernetes Slack and the CNCF Slack, so you can notify me there. It should be pretty easy to find.

Bart: Well, Brian, thanks for sharing your time, knowledge, and experience with us today. I'm looking forward to crossing paths with you in person and hearing more about the advances in your next project.

Brian: Thank you for putting this together. The questions were really well thought out.

Bart: I did enjoy it. We'll speak to you soon. Take care.

Brian: Thank you. Bye.

Listen anywhere

Kubernetes experts reacting to this episode

Evolving Kubernetes platforms: From infrastructure as code to developer experience
with Arshad Sayyad
The future of Kubernetes: From Gateway API to AI integration
with Lior Lieberman
Kubernetes evolution: Platform engineering and serverless future
with Jason (Jay) Smith