Building better database tooling for Kubernetes
A practical exploration of Kubernetes ecosystem maturity through the lens of tooling, stateful workloads, and operator development.
Rotem Tamir, CTO and Co-founder of Ariga, discusses:
The evolution of the Kubernetes tooling landscape, sharing hands-on experience with K9s, Grafana Loki, and Argo CD and their impact on daily operations
Why running stateful applications in Kubernetes remains challenging, explaining the fundamental conflict between declarative management and stateful resources
How to build effective Kubernetes operators by focusing on developer experience and exposing meaningful status information through well-designed Custom Resource Definitions
Relevant links
Transcription
Bart: The host is Bart Farrell . The speaker is Rotem Tamir (works for Ariga).
Rotem: Hi, my name is Rotem. And for the past four years, I've been the CTO and co-founder of a company called Ariga.
Bart: What are three Kubernetes emerging tools that you are keeping an eye on?
Rotem: So, everything in Kubernetes is still emerging. I will take the liberty of talking about tools that may have been around for a bit. One is K9s, a terminal-based GUI for Kubernetes. I tried to think why I like it so much. When I was five, my dad got me my first computer, a DOS box, which had an application called Norton Commander, a GUI file system manager. I guess it reminds me of that, so I am really fond of K9s. The second tool is Grafana Loki, an alternative to the ELK Stack. For my team, it has saved us a lot of money and is much more stable than tools we have used in the past. I like infrastructure tools when they fade into the background and you do not think much about them. So, a shout out to the Grafana people - it's really awesome. The last tool I noticed at KubeCon is Argo CD, which is everywhere but still emerging. It's really cool to see how many people are adopting it, and it's a great solution for continuous deployment on Kubernetes.
Bart: Relating to GitOps and platform engineering, one of our guests, Hans, argues that GitOps is an excellent building block for building platforms with great developer experience. He mentioned the ability to merge, review, and discuss code changes in pull requests, and the additional benefit of not granting permissions. Should all platforms use GitOps? What's your experience?
Rotem: I think GitOps is a very sensible conclusion if you commit to the Kubernetes philosophy of declarative resource management. We needed a concrete strategy for taking the content in Git and making it appear or be managed continuously in our cluster. However, it's still a new world and many topics are still not solved. For example, security or permissions are areas where we don't have good controls over who can do what or manage which resource in our cluster. Stateful resources have been a big pain for many people, and there seems to be a clash between the declarative philosophy and stateful resource management. It makes a lot of sense, but it's still not perfect.
Bart: One of our guests, Steven Sklar, shared that you can and should run a database on Kubernetes. The tooling and practices have matured since Kubernetes began, and you should run stateful applications there. What's your experience and advice with running stateful applications in Kubernetes?
Rotem: Kubernetes is amazing for managing stateless resources. The deployment controller was implemented based on the Immutable Infrastructure philosophy, where a new resource can be provisioned, verified as healthy, and then the old resource can be replaced. This approach works well for compute resources. However, it does not work for databases, as stateful components cannot be simply replaced without risking data loss. There is a concept clash that needs to be remediated. I presented a talk about GitOps, database schema changes, and rollbacks, arguing that current tooling does not work well with GitOps. Classic schema management solutions do not fit into the Reconciliation Loop, especially when it comes to rollbacks. I have committed to solving this problem by building a Kubernetes operator for declarative schema management, allowing stateful resources to be managed declaratively. However, if mature tooling does not exist for a particular resource, it should not be managed from within Kubernetes. During an outage, it is undesirable to have an immature tool, which would add to the problem. Although the industry has evolved and better tools are available, stateful resources should only be managed with good, specifically designed tooling for Kubernetes.
Bart: You mentioned building operators. One of our guests, Steven, shared some simple but effective advice on building Kubernetes operators: keep it simple and use multiple CRDs. Do you have any advice on operators?
Rotem: One of the best choices we made as a company about a year and a half ago was to bring Atlas, our schema management tool, into the Kubernetes world by introducing an Atlas Kubernetes Operator. A few learnings from that process that we can share with the community are: be really focused on developer experience. As an operator provider, someone building an operator, you're making a promise to your users to offload the complex stuff and take care of everything. To do that, you need to do two things. First, think about the API or the custom resource definition - what the spec will look like. It should make sense and represent the different knobs that the user should have when managing a database schema. The second part of thinking about the API is the status, which is what the operator exposes to the user. Since many users see operators as black boxes and don't understand what's going on inside, it's really important to expose the status. For example, if your operator stops the Reconciliation Loop due to some edge condition, you need to give an informative error message to the user. So, think about the inputs, think about the outputs, and make life easier for your users. The second thing that's really important when building an operator is having the expertise, because operators are about codifying operational knowledge into a program that can replace a human operator. For the Atlas operator, our promise is to provide the expertise of the best DBA in the world to manage database schema changes. You tell them what you want, and they take care of it. We work hard to codify operational knowledge from the industry into a program that can satisfy this process.
Bart: Kubernetes turned 10 years old this year. What can we expect in the next 10 years to come?
Rotem: I think that a process similar to Linux will occur, where it fades into the background. When I started developing, I had to think a lot about the operating system and what was available, including installing device drivers. Today, all of that is abstracted away from us. We think about Kubernetes and building applications using basic Lego blocks, which are still at a relatively low level. I assume that as our industry builds higher and higher order abstractions, in 10 years, when my junior developers become CDOs themselves, their junior devs won't have to think about Kubernetes. It will be an implementation detail that exists in the background.
Bart: What's next for you?
Rotem: So, I've been four years into building Atlas and Ariga. I think we've only just begun. We have a lot of traction from many companies and many users, but I think we still have a long way to go to get to the point where we can say that database schema management and Kubernetes is a solved problem. And I'm really looking forward to working with our users, with our community, to fulfill that promise.
Bart: How can people get in touch with you, Rotem Tamir from Ariga?
Rotem: The best way to get in touch with me is by visiting the Atlas website, where we have an Atlas Discord server for our community. My team and I are constantly available there to answer questions. You can also send me a direct message. I'm on Twitter as well. I prefer not to refer to it as X. Feel free to send me a direct message there; I'm happy to answer any questions.