Making autoscaling dead simple in Kubernetes: KEDA

Making autoscaling dead simple in Kubernetes: KEDA

Oct 2, 2023

Hosts:

  • Bart Farrell
  • Daniele Polencic

Guest:

  • Jorge Turrado

How do you scale your pods on queue length?

In this episode, you will learn:

  • How KEDA simplifies autoscaling in Kubernetes thanks to its vast collection of metrics collectors (i.e. scalers).

  • Jorge's journey in tech: how he levelled up from passing wires as an electrician to learning Go and becoming a KEDA maintainer.

  • Two must-try KEDA scalers: the HTTP add-on to scale to zero and the Carbon-aware scaler for reducing your carbon footprint.

You will also dive into what it takes to maintain a successful graduated CNCF project.

And lastly, what three Cloud Native tools would Jorge take to a desert island?

Spoiler alert: KEDA wasn't his first choice.

Relevant links
Transcription

Bart: Welcome to KubeFM, a podcast where you can expand your Kubernetes knowledge, follow the latest tools and trends, and learn directly from the experts. In order for a project to become graduated, it has to go through a different series of phases in order to show that it's fully mature and ready to be adopted by many, many end users out there in the world. Kubernetes event-driven autoscaling, or better known as a KEDA project, recently became a graduated project. Started out in 2019, collaboration between Microsoft and Red Hat. before being donated to the CNCF. Daniele and I got a chance to talk to Jorge Turrado, who is a KEDA maintainer and is also a CNCF ambassador. Let's hear what he had to say. So in today's episode of KubeFM, we are joined by someone who's much more than a person. We can say a sort of a superhuman, a maintainer, a wonderful colleague, a great speaker. He's not wearing his bathrobe right now, but normally you can find him wearing that. He's an incredible person, Jorge Turrado. Lucky enough that we live just an hour away from each other. But more importantly, he has an interesting background that started in something quite different. So, Jorge, do you want to tell us about who you were before you became the KEDA maintainer that you are today? Yeah, for sure.

Jorge: I worked passing wipes because I was electrician. Then I developed robotics, and then I decided to work in software because it's really nice work. a better work than being in the floor with the wire. No, I'm kidding. I love the software. I started developing my own stuff when I was a teenager and basically I was reporting or changing until I reach my current position. My, I work, currently I work as a Staff SRE in SCRM, the Lidl Digital Hub, Lidl the supermarket. And before that, I worked also as SRE in DocPlanner, DocPlanner Tech, then I worked for Plain Concepts and as a software developer. So I have touched all the phases in the software development. Before I started as developer, I swapped to cloud engineer, more or less, and then SRE slash

Dan: So what's your background? Is it Java?

Jorge: Yeah, I have been in all websites, except in front. I have a serious problem. I can't understand how the browser works. Maybe that's why I prefer the CLI.

Dan: Okay. So how did you move from Java to cloud stuff? I mean, you know, Java is quite, you know, verbose and, you know, sort of enterprise-y. And then on the other side, there is a lot of configuration. So how did that happen?

Jorge: I didn't start with Java. I started with C++ and then I moved to C#. But the way was almost natural because I'm in plain concept as a consultancy supplier, I worked in different projects and I have the option for just moving myself to more infrastructure related stuff. I started with Kubernetes. I for six years ago, I was lucky because there was a project that I could work for and learned about Kubernetes and Kubernetes was, it was like, okay, this is the future. I don't know if in 10 years or in 20 years, this will continue being the future. But nowadays, this is the future, is the way to go. Because we have started with the typical traditional servers, and they don't scale. You have the other option, that is virtual servers. I remember that we switched from the physical servers to virtual machines, but that was a pain. And Kubernetes, the container engines in general, is a a nice mix between them and when I started with Kubernetes I decided to…

Dan: What version was there? I was about to say, where the PetSets there?

Jorge: I think that the PetSets were just removed or around there. Now it's known as a StatefulSet. In the beginning it was named PetSet, because that world is more fancy and you love it as your pet.

Bart: I mean, I think it is somewhat endearing, in particular when you're talking about things that aren't so tangible, bringing those concepts to life with something that's more relatable. I mean, look at all the things in mythology have gone around, you know, where it's kube c-t-l or kube cuddle, you know, taking that to a different level. But it's interesting to see how OG someone is based on, do you still call it a PetSet by accident? That's good. But I guess in your experience then, Ggtting involved in contributing and then becoming a maintainer, tell us more about how that got started. And then we want to hear more about how you got involved in KEDA, what it is and what you've been doing there.

Jorge: Okay, basically, I started using KEDA before joining as maintainer, obviously. But I started with KEDA maybe three or four years ago because I was a developer. In Plain Concepts, we work really close to Microsoft technologies, and Azure Functions and Microsoft technology for serverless based on events. And KEDA came to fill that gap in Kubernetes for being able to scale based on those events and on those events. I started using it, and when I joined to DocPlanner, we had a use case where we need a Rabbit feature. And I'm not the guy who just open an issue crying, please add me this feature. I try to, OK, if it's in a language that I manage or that I at least know a bit about it, I will give a try. And I will try to do it by myself because I understand that maintainers are busy. and they don't have time. It's not a commercial trick. I started contributing with KEDA in some features and to be honest, I started to contribute more frequently for improving my Go Lang and skills. And there is another maintainer, Zbynek Roubalik, another maintainer, who teach me a lot of stuff related with Go Lang. And I started with the contributions for improving my Go Lang skills. And after several months contributing frequently, they proposed me to join the team. And I have been in that whole thing since then.

Dan: Back to RabbitMQ, I mean, at the time, you must have had the choice to choose between KEDA, something new that not many people are using. And then on the other side, the usual, what I would basically consider the standard, Prometheus plus the adapter, just gonna integrate with Kubernetes. Was that not an option at the time? Why did you go with KEDA? You know, it must have been so new.

Jorge: Depends because at the end of the day, if you are using Prometheus, you are scraping the metrics. So all the information that you have for scaling is the information that you could get from Prometheus. And not all the times is super useful. For instance, and I have a case in the... a guy who asking Slack channel. No, but we have a slow responses scaling. We are scaling after three or four minutes. That does not work for us. Yeah, but that's the price that you have to pay when you are scaling based on monitoring tooling. If you have an option for going directly to the metric source, such as KEDA going through Rabbit or Kafka or any scaler. you are saving the scraping or the observability time there.

Dan: I think at this point, maybe it's worth taking a step back and explain how KEDA works. So I think we discussed, we sort of jumped on to the two options that you usually have. So KEDA versus Prometheus and the adapter. But like you said, KEDA is a lot quicker and gives you a lot more options. So what's the architecture of this scaling technology? Can you do a quick summary for us? Yeah, for sure.

Jorge: Basically, KEDA is a group of three different components nowadays. Those components are the operator, the metric server, and the admission webhook. And this last component is optional. KEDA deploy its own CRDs, ScaledObjects, ScaledJobs, triggers. Basically, KEDA supports four different custom resources for different stuff. And KEDA, when you deploy your scale job. Yeah, the state jobs are basically a wrapper on top of HPA. So the operator will deploy an HPA. with some external metrics configure, and we configure the metric server, KEDA's metric server to serve those metrics. So, KEDA is relying totally on the HPA controller because it's a well tested piece in Kubernetes. For us is better than reinventing the wheel. And how it does it depends totally on the scaler because we have to do it. We support Azure, we support AWS GCP.

Dan: I had a look at the list and it's just never ending. I was about to ask, how do you manage such a huge list? I mean, at the beginning it was small. I remember the beginning of a project was manageable, but now it's just like a never ending list.

Jorge: During the years, we have learned some lessons. And one of that lesson is that each new maintainer must have a scaler, sorry, each scaler must have end-to-end test.

Dan: Okay.

Jorge: That's a strong requirement. If you don't develop the end-to-end test, we won't merge the scaler. Why? Because as you said, how can we maintain them if we don't know about the technology? Because there are two different technologies and you can know about 60 different technologies. The point is that during the years, we have learned in the worst way those lessons. And now we have really, in my opinion, really complete end-to-end suite where we test I don't remember 80, 90 different end-to-end cases because we test the scalers, but we also test the secret provider, some internal stuff.

Bart: Jorge, as someone who's very conscious, very aware of his environment, I wanna know more about the Carbon-Aware KEDA operator and what it's doing, how this came about, who was behind it, what people should know about.

Jorge: Microsoft presented this scaler on KubeCon. in the keynote the first day in KubeCon, they presented this scaler. We weren't related or we weren't working on this directly, but we try to contribute with a SIG, no, it's not a SIG, a user, I don't, everyone with the name, sustainable, sustainable, Sustainability, especially in this group or user, I don't remember the keyword, sorry. But we were there because in KEDA, our goal is to make the things more efficient. And usually that means better for the environment. And this is the next step. If you have a lot of critical workloads who can't be or which can be delayed to other moment, This scaler allows to, okay, limiting the scaling out or limiting the one rule size for helping the environment and the sustainability in general. With this scaler, with this carbon-aware scaler, basically, we can... Microsoft improved or extend KEDA for reducing the amount of kind of concurrent instances based on any Carbon API, because I didn't know one year ago, but there are public APIs who say in this region, the Carbon footprint is X, and they give you the Carbon footprint for a region at any moment. So you can use that API for saying, okay, a region, carbon footprint is too high, and I have woworkloads non-critical, let's move them to another moment when the carbon footprint is slower or is smaller. And basically that's the goal of that operator. It's nice, we are talking internally about how is the best way to integrate it because now it's an external stuff, but we are working to integrate that functionality as internal stuff within KEDA for not having to deploy any other operator. Because being true, having Azure in the naming, is a problem for some people.

Dan: I can see that. There is another sort of project related to KEDA, which is, I would say, at least, as someone using KEDA, I think I find it very valuable, and that's the HTTP add-on. So currently, as far as I know, that's been advertised as beta. It's not part of the project, but it really is, and it's extremely, one, interesting, and two, useful. And can you gives us, you know, a little bit of a summary of that.

Jorge: Yeah, the HTTP add-on is a component for providing first class support to HTTP workloads. So if you want to scale based on HTTP, for instance, using Prometheus, that's not enough for all the cases. In our case, we use the HTTP add-on. We use Prometheus for scaling based on HTTP traffic, but with that, we can't scale to zero. So we, the guarantee that we will have at least one instance at least two or three for high availability, but we can't ensure that a server will be there for answering any request. But going forward in this approach, okay, we want to reduce the cost, we want to be more cost efficient, why I can't scale to zero. For scaling to zero, we need a component in the middle who hold the request before the backend is ready. And the HTTP add-on is that component. Basically, the HTTP add-on is another component that we didn't integrate with KEDA core, basically because it's too different and it has its own problems, totally unrelated with KEDA problems, but... At the end of the day, the add-on is an interceptor in the middle, but you send all the traffic to it and it checks that the target workload has at least one instance? No, I will hold the request until it has at least one instance. When the target workload is ready, I will pass the request to it. This change or this feature allows a cold start. So if you have several instances, you can start up your pod for just one request and after the request, you can kill the pod automatically. So you can do an efficient scaling based on HTTP. The problem with it is that it's too different from the common KEDA problems and we are actively looking for help on it because the HTTP protocol itself and the HTTP scenario change a lot. We have just integrated support to HTTP2 and now there is HTTP3 that we need to integrate there. It needs a really, really hard work.

Dan: And on top of that, I think, at least the last time I used it. The plan was, okay, when you deploy the interceptor, the interceptor is going to look at all the requests passing through and it's going to decide if it is time to scale or not, right? And then there was like a very small note saying, and by the way, this in the future might not be needed. because we have OpenTelemetry, right? And today, I just had a look at the roadmap and I found the OpenTelemetry scaler as well. So what is gonna be the future of this HTTP add-on? Is it gonna be OpenTelemetry?

Jorge: They are totally different this time. Okay. I mean, OpenTelemetry, is just for observability. You cannot, the problem about holding the request during call to start is still there. If you use or even if you don't use OpenTelemetry. OpenTelemetry at the end of the day is an observability protocol for all. transferring telemetry, recording, sending, and processing telemetry. The OpenTelemetry integration is for sending the metrics also using OpenTelemetry instead of only exposing them through Prometheus. But it's totally unrelated. The problem with HTTP add-on is that until we can find at least two, Let's say full-time maintainers. Now, not full-time because they invest their whole day, but at least two person engage with the project for solving issues, helping to newcomers and those kinds of things. We can't say that this production grade component.

Bart: Yeah. And this is the part of the podcast where we let everyone know that Jorge has been a Microsoft MVP for at least how many years?

Jorge: Five. I don't remember. Five.

Bart: Something actually I want to touch on a little bit too is, is, you know, You've now taken this, you know, from sandbox to incubation, you know, to graduate it along the way, getting other organizations to participate as well as contributors, like you mentioned, you know, the there are folks out there that would like to help out with the HTTP add-on. What challenges are going to be facing to get involved, but how has this been, you know, through the time that you spend. first as a contributor, then becoming a maintainer, interacting, making it attractive for other organizations to get involved. As advice for any open source projects, what's your experience been like there and what would you recommend?

Jorge: It's a good question and this time I don't have an answer, a good answer for you because

Bart: We will accept your bad answer.

Jorge: The public image or the public staff or how to engage the community, I think that we have failed a bit in that topic because maybe because, maybe due to maintainer's capacity, we have been five maintainers. We are five maintainers. and we focus on improving KEDA more than marketing, more than do some stuff. And I know that other projects make conference. There are conference about Argo CD, Istio Con. There are a lot of conferences.

Dan: You built such a nice product that doesn't need any sort of marketing.

Jorge: Our goal is make the auto-scaling that simple. And maybe as we don't have any competitor, let's say competitor, I don't consider competitor in an open source ecosystem, competitor is not the word in my opinion, but let's try to use it for understanding. ArgoCD has Flux, Linkerd has Istio. KEDA hasn't got any competitor nowadays because we don't try to do a lot of things. We try to make the auto-scaling dead simple in Kubernetes. And I hope that we are achieving that goal. So I guess that we have talked about this sometimes, but we don't do, who has needed something based on events in Kubernetes, knows KEDA.

Dan: Which is basically all developers using Kubernetes.

Bart: But it's worked. And so with that in mind, what are the things that are on your roadmap, you know, for the next steps? What can we expect from KEDA?

Jorge: In the, soon, probably in next release, we are gonna introduce, in my opinion, a game changer. with the autoscaling because nowadays, they, as we rely totally on HPA controller, HPA controller basically request all the metrics and perform a max operation between them. This is not really efficient in all the cases because if you have a database and a queue, Rabbit, for example, if the queue grows, maybe it's because the database is dead. and having just a measure, okay, scale based on the queue, usually it's nothing. Having an option for inside KEDA, applying a custom formula saying, okay, go through the database and get the usage. And if the usage is higher than 60%, reduce proportionally the queue length from Rabbit for not overloading the database. I mean, adding custom formulas to the information that KEDA expose to the HPA controller is a nice feature that for me is a game changer in the auto-scaling in Kubernetes.

Dan: How is that implemented? Is it to write the formula in the YAML? Do you have like a small engine that I can program in WASM? What's it?

Jorge: The PR is open. on it, so we are reviewing it. But the original idea that we are discussing is you can extend the scale object with other section, formula, modifiers, whatever it's still in discussion, where you can say, okay, apply this formula. As each trigger can be named, has a name, you could just introduce, and we are introducing library packets for processing formulas. I on the fly. So if you can say, I have three triggers, Rabbit, Database, and Prometheus. Okay, I named this out, then as Rabbit, Database, and Prometheus, and I apply the formula, and the formula is, quote, Database plus Prometheus less whatever. And the form, the engine, will process that formula in real time.

Bart: Jorge, when do you sleep?

Jorge: I don't know. Well, when I'm dead.

Bart: Sorry, sorry, Dan, I cut you off.

Dan: No, it's alright. It's alright. I was about to say, it's only five of them. I was just going to say. It's only five of them. They do all of this. So, and then they also brag that they are not able to do any sort of advocacy about the project. I mean, sure.

Jorge: Well, I have to say that we have amazing and awesome contributors. who are doing the majority of the feat. Because obviously five guys there, we don't have the capacity for addressing all the things. We don't do the things. We have a nice community, which is growing. All the weeks have, or all the months, appears a new contributor. We are, I have to say that after the graduation, I personally have noticed that the interaction has grown, more people in Slack channel, more issues, more PRs. That is good and also a problem because we are five maintainers. We need to think about how we are organizing the staff. We need to think about how we can manage this situation for not dying by success.

Bart: So you have an electrician, a plumber, a painter, a mason, one of each area covered. Now I guess, I think there's a lot to be learned from this, it's incredible to see how far the project has come and particularly, like you said, without having some of the outreach or public facing visibility that other projects have had. Jorge, for people that want to get in touch with you, what's the best way to do it?

Jorge: I am in a lot of different Slack workspaces, in Kubernetes, in CNCF.

Bart: Microsoft Teams.

Jorge: I'm in Teams.

Bart I heard the alerts. Yeah, I heard the notification.

Jorge: Yeah, yeah. Microsoft… what? You can try in your Slack workspace, I've been at Jorge Turrado, maybe I will be there. Let's try.

Bart: Who knows? You're also a CNCF ambassador. We're hosting meetups in Bilbao.

Jorge: X for me, still be Twitter. It's still Twitter.

Bart: It's still, it's still PetSets and it's still Twitter. That's the title of this podcast.

Jorge: I will Twitter at the handle is at Jorge Turrado. Just typing Jorge Turrado in Google, there are a lot of communication channels. I have all of them open.

Dan: Get in touch and become a maintainer. That's it, that's the code.

Bart: Jorge, do you know which three tools you would take with you to a desert island?

Jorge: Alright, so Prometheus is one.

Jorge: Okay, how do you want to monitor your cluster without Prometheus? Well, You are so rich, I guess. Because Azure Monitor, AWS CloudWatch, GCP monitoring, all of them are super expensive. You should be rich.

Dan: That's a good point, be honest. Well, you can always switch the storage, right? And just store the last couple of days.

Bart: So apart from Prometheus, what are the other two tools?

Jorge: Is it too ugly if I say KEDA?

Dan: No. I think that's fair, right?

Bart: It's not surprising. I mean, yeah. Yeah, I wouldn't be surprised. You would be a bad maintainer if you did.

Dan: Yeah, I would expect it to be the first, to be honest, I'm a bit disappointed now.

Jorge: And the third one probably would be Argo, or Argo CD, or Argo Rollout more indeed. We are exploring the integration of Argo. Argo rollout because we have integrated Argo CD and the feedback from our development teams is really good. Really, really, really nice feedback. Like, at the beginning I thought, okay, another CD tool that I have to learn. And after two or three weeks using Argo CD, my mind has changed. And how could I live without Argo CD before now?

Bart: Perfect. Well, Jorge, thank you very much for your time today. And yeah, we look forward to seeing the next steps. Very action-packed roadmap. Folks that want to get involved in the project, check it out. It's an amazing space, very dynamic, with a lot of things going on there. So yeah, thank you very much for your time. We'll be seeing you soon.

Dan: You're welcome.

Kubernetes experts reacting to this episode