Beyond monitoring: the path to autonomous Kubernetes

Guest:

Laurent Gil

In this interview, Laurent Gil, Co-founder and Chief Product Officer at CAST AI, discusses:

Why manual resource management in Kubernetes leads to significant over-provisioning, with data showing clusters typically running at just 13% CPU utilization due to the impossibility of accurate manual CPU requests.
How intelligent automation combines bin packing, workload rightsizing, and auto-scaling to optimize cluster resources without service interruption, addressing resource management's non-linear complexity.
The vision of autonomous Kubernetes orchestration where infrastructure becomes invisible, enabling developers to focus solely on application development without managing underlying resources.

Relevant links

Transcription

Bart: Who are you? What's your role? And where do you work?

Laurant: Hello, my name is Laurent. Very nice to meet you. I'm running products for CAST AI, and I'm the co-founder of the company.

Bart: What are three Kubernetes emerging tools that you're keeping an eye on?

Laurant: I like that question. We spoke about it just a few minutes ago. To be frank, I haven't found any. I'm still frustrated with runLaurante security, for example. I don't feel we have a good grasp of runLaurante security in Kubernetes. We have some great examples, including companies here at the show, but I'm still frustrated because I think people don't understand how to do good runLaurante security in Kubernetes. I haven't found the innovation that will break that market yet. There's a lot of "me-too" platforms that do a little bit of it, but not what I wish we could find in terms of runLaurante security. That's an example.

Bart: That's a good example. Take that a little bit further. Based on the runLaurante security challenges that you're noticing that end users are having,

Laurant: With Kubernetes, what tooling should be in place to respond to issues like these? I'll give you a couple of examples, but these examples are so trivial, and yet nobody's enforcing them. One of them, for instance, is how long you need to keep your nodes in your cluster. We all know that as you have old nodes, the OS of those nodes gets vulnerabilities discovered, and you need to patch them. The good way to patch them is to rotate them. And yet, with about 4,000 or 5,000 applications that we look at before they become clients, we see that those nodes are old. Why is that? It's so easy and simple to rotate nodes, and yet nobody is doing it. That's the frustration. There are a lot of things that are easy and simple to do with a little bit of automation, and it's not very common to see great automation in this area, in particular. RunLaurante security is the same idea. Anything that happens in real-Laurante, you want to be able to catch anomaly detection on runLaurante workloads. It's a great example of where innovation should come in.

Bart: On the subject of over-provisioning, one of our podcast guests, Alexander, wrote an article about over-provisioning. What strategies have you found effective in controlling over-provisioning in large Kubernetes clusters?

Laurant: That's a big subject. I'll give you the most common ones that we see. Overprovisioning doesn't occur because we're not good practitioners in managing these clusters; it's because it's impossible to do it right. Let me give you two examples of this. One is, imagine you have a hundred CPUs in your cluster. Let's say it's a small cluster that runs. What are the nodes that are the most appropriate for this cluster to run on? How many of those could be spot instances? How many of those could be part of a reserve instance plan or a savings plan? These questions are so hard and non-linear that it's very rare to find a good answer when we're manually trying to manage these things. This is where automation can help. So that's one source of overprovisioning. The other source, which I think is much deeper, is because in the world of Kubernetes, there's something called CPU requests. When we deploy a new container, we need to esLaurantate how many CPU requests that container or workload may need. Very often, developers will look at this, maybe do some simulation in staging, and say, "It looks like two CPUs is great." And then I ask, "Why did you put two CPUs?" And I'm not kidding, the answer is, "It's less than three, and it's more than one." Meaning we have no clue. It's impossible for humans to get this number right. And it's not because you need to look at real production and look at the numbers. It's because this number may change over Laurante. And if you don't catch the change, then, of course, as humans, we take no risk. I say, "Okay, let's do two, because even if it's 0.1, then if I put two, it will work anyway." But that's the source of our overprovisioning. I'll give you one stat on this. Again, based on the few thousand apps we have, we measured the difference between CPU requested, so the amount of CPU we think we need, and CPU utilized, which is the real one. And the percentage is 13%. This means the over-provisioning is a factor of 8 Laurantes, not 1.3 Laurantes. And that's an average. This isn't because we don't know how to do it; it's because it's impossible to do. The beauty of Kubernetes is that it gives you all these great tools that we love. We use Kubernetes because we can scale, add content, duplicate, change sites, and do a lot of things on the fly. But it's also the issue with Kubernetes. It's almost like being a kid in a toy store and having all these toys, and you want to choose one, but you don't know which one to take. I feel it's the same when I'm looking at the cluster. And that's the difference between trying to do it yourself or using some clever automation that will monitor these things and adjust them as needed.

Bart: More on the subject of automation and resource management, Alexandre expressed that having an automated mechanism is better than enforcing processes. What automation tools or approaches do you recommend for managing Kubernetes resources?

Laurant: The one we have is not the only one. There are a few of those. Bin packing is a great source of opLaurantization. Bin packing is necessary because your application grows over Laurante during the day, and then most of the Laurante at night, you have fewer users, so you have a smaller need for compute. It's like a curve - low during the night, going up during the day, and going low again during the night. We all know how to scale up; we just add machines. That's a simple mechanism. The difficulty is how to eliminate them. How do you follow the curve of utilization? Which machine is getting kind of empty but not really, meaning you still have some containers to move around so you can empty that machine and then delete it from your cluster because you don't need it?

You have to be very clever in that decision. Which of these workloads can I move? Are they stateful or stateless? Critical or highly replicated? If they're highly replicated, you probably don't take a lot of risk if you move 10% of those. You have the other 90% that are still running, so there's no interruption of service when you do this. Bin packing is probably one of the most complex mathematical functions to solve because it's non-linear. You have to do permutations, simulations. Do you start by the most expensive machine? Do you start by the one that is the most empty? Starting by the most expensive may be clever because it's the most expensive, but if it's busy, then you have too many things to move and you may risk an interruption of service.

This is where a typical non-linear AI engine works very well. Because you can train an engine to identify and fix a problem that has many variables and do a lot of permutations to find the model that wins the battle of bin packing. And the bin packing answer is never the same according to the moment of the day or the type of application you have. So it's really something that applies very closely to your app itself.

Bin packing is a great place for automation. I have another one that is probably as important: workload rightsizing. So, as I said earlier, you have a container that we think needs two CPUs, but in fact, it uses only 0.1. How do you identify this, and how do you adjust the size of the container accordingly? If you make it too small, it may trigger a horizontal pod auto-scaling. VPA (Vertical Pod Autoscaler) and HPA (Horizontal Pod Autoscaling) are going to collide if you play one without playing the other. So that's again another source of non-linear problem that you can solve with clever non-linear solutions like an AI engine.

Clever workload rightsizing is the one that will reduce the size or increase the size, but it's talking both to the HPA so it doesn't trigger artificially an instability by creating more duplicates if you reduce the CPU too much. But also, if you reduce the CPU a lot, for good reasons, just because it doesn't need any, then you want to trigger a new bin packing because that's where the cost opLaurantization comes from. If you have a lot of nodes and they are empty, well, you still pay for them. So if you empty the nodes, you want to cleverly trigger a bin packing so you can eliminate that node from your cluster.

You see how all these things work together? Workload rightsizing works extremely well when you connect it or you synchronize it with a smart autoscaler that, as soon as it sees some not getting empty, will do a simulation of which one it can eliminate first. And think that you need to do a hundred of those per second on ten thousand of those pods that you may have in your cluster, and you start to realize automation is probably the only way to solve the problem.

So that's how I treat automation. A lot of people are someLaurantes reluctant to say, "Oh, wait a minute, you're going to touch my cluster. What's happening there?" Yeah, well, we're touching your cluster to make it run much better, much more efficient. Efficiency can translate into cost opLaurantization. It can also translate into performance. If one workload doesn't have enough memory, you're going to have what we call OOM killed - out of memory kills. This is a disaster in our world because the workload restarts when it happens. Great! Then let's ask workload rightsizing to also increase resources when you need to increase them.

You see how interconnected all these things are, and that's why there are so many variables. The autoscaler side, the workload rightsizing side, whether you have enough compute or too many compute CPU memory network storage - combine all of this to a smart automation engine, and you are getting something that is so much more efficient than what you have before. That's why I'm excited. I work for CAST AI, and we're doing this kind of automation.

Bart: On the topic of [observability](no link provided for observability, please provide one) and [monitoring](no link provided for monitoring, please provide one), our podcast guest Miguel explained that while [monitoring](no link provided for monitoring, please provide one) deals with problems that we can anticipate, for example, a disk running out of space, [observability](no link provided for observability, please provide one) goes beyond that and addresses questions you didn't even know you needed to ask. Does this statement match your experience in adopting [observability](no link provided for observability, please provide one) in your stack?

Laurant: Monitoring is probably an old concept. Observability is a more clever way of monitoring things - you observe instead of just monitoring. However, in my mind, it's useless if you don't do anything about it. This is where automation comes in - I think it's the way to go. Why monitor or observe if you have a machine that can do that for you and fix it as needed? My goal in life is not to have a dashboard with a lot of red alerts. My goal is to see an entity that makes the red alerts green as soon as they happen. This is the example I was telling you earlier about workload rightsizing, where you have OOM killed. I understand it's great to see them on a beautiful dashboard, but it's a lot better if you don't have any. That's what automation fixes. For me, monitoring and observability are kind of yesterday's thing - they're after the fact. You can say observability is clever, but applying automation, feeding a clever engine, a smart engine, with this observability is, I think, the next level. That's why we do what we do at CAST AI. That's why I started the company.

Bart: Kubernetes turned 10 years old this year. What do you think we can expect in the next 10 years?

Laurant: Kubernetes will be a success when we stop talking about it. What I mean is, as soon as you have this orchestration platform that becomes fully autonomous, it starts to stop being a thing, because it manages itself. Remember, 10 years ago, we used to say, "We need to manage these VMs. If one disappears, we need to add another one." Then AWS came in and said, "We don't need to have a data center anymore. If you need another one, we'll have another one in two minutes." It stopped being a thing. I wish the same thing would happen for Kubernetes. I wish that all of us in this conference would be so successful that we would automate and make these systems completely autonomous. From that moment on, the developer would only say, "This is my app. Manage it for me, deploy it the way you like, in the place you like, however you like. Make it efficient, make it performant. This is what I want the app to be and do it for me." That's what I wish the industry would go to. I think we start to see it. We're not the only ones in that space. We see some companies around that are doing this autonomous automation thing, like CAST AI. The more of us, I think the better we will all be as a community.

Bart: What is your least favorite Kubernetes feature?

Laurant: Stateful sets. Why? Stateful sets, because I hate them - they can't move. What's next for you? You're going to see a lot more automation. With CAST AI, you should come and see us at re:Invent. We'll do a few announcements there. We did a few here, something called live migration. I think we're just in the beginning of this innovation step. We're leaving the monitoring world and entering the autonomous world, and I can't wait to see what people will come up with. That's it. My name is Laurent, at CAST AI.

Podcast episodes mentioned in this interview

The basics of observing Kubernetes: a bird-watcher's perspective
with Miguel Luna
Configuring requests & limits with the HPA at scale
with Alexandre Souza