Solving Cold Starts: Uses Istio to Warm Up Java Pods
Host:
- Bart Farrell
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
If you're running Java applications in Kubernetes, you've likely experienced the pain of slow pod startups affecting user experience during deployments and scaling events.
Frédéric Gaudet, Senior SRE at BlaBlaCar, shares how his team solved the cold start problem for their 1,500 Java microservices using Istio's warm-up capabilities.
You will learn:
Why Java applications struggle with cold starts and how JIT compilation affects initial request latency in Kubernetes environments
How Istio's warm-up feature works to gradually ramp up traffic to new pods
Why other common solutions fail, including resource over-provisioning, init containers, and tools like GraalVM
Real production impact from implementing this solution, including dramatic improvements in message moderation SLOs at BlaBlaCar's scale of 4,000 pods
Relevant links
Transcription
Bart: In this episode of KubeFM, I'm joined by Fred Gaudet, Senior Engineer at BlaBlaCar. He recently wrote about a problem every Kubernetes team hits sooner or later: slow pod startups. In this episode, we dig into how his team used Istio to warm up pods before they hit production traffic, avoiding cold starts and stabilizing user experience. Along the way, Fred shares the design trade-offs, how sidecars can help smooth rollout strategies, and why observability was key to proving the approach.
We're excited to partner with TestKube for today's episode. Testing applications in Kubernetes doesn't have to be complicated or slow. TestKube is your go-to platform for cloud-native continuous testing. With TestKube, you can effortlessly orchestrate, scale, and automate tests right within Kubernetes, catching issues early and deploying with confidence. Try it out today and see why engineering teams trust TestKube to keep their testing simple and scalable. Visit testkube.io to get started.
Now let's get into the episode with Fred. Hi Fred, welcome to KubeFM. What three emerging Kubernetes tools are you keeping an eye on?
Fred: Hi Bart, I'm happy to be here. Three really interesting things regarding Kubernetes are the updates around Vertical Pod Autoscaler (VPA). For example, we've had issues sizing daemon sets. It's always been challenging. With the latest promising updates around VPA, we should be able to have a different sizing configuration for our daemon sets, enabling better sizing.
We're also looking closely at the multi-cluster SIG. We aren't a primary target for this because we only want a couple of Kubernetes clusters, but in the future, it could be a challenge. So I'm keeping an eye on this area to see whether we could embrace multi-clustering in the future.
Bart: Two out of three ain't bad. For people who don't know you, what do you do and where do you work?
Note: I've added a link to the song reference "Two out of three ain't bad" since it appears to be a cultural reference. For the rest of the transcript, there are no specific technical terms that require hyperlinking based on the provided guidelines and links.
Fred: I'm working for BlaBlaCar, a French company that offers travel mobility. Our base market is carpooling. For example, someone wants to reach a destination in Spain, and another person offers a seat in their car. This helps reduce CO2 emissions by having two people travel in one car. We now also sell bus tickets and, more recently, train tickets. We started in Spain with IRYO and another company.
In this context, I'm part of the infrastructure team. We manage our Kubernetes platform, which is GKE and hosted in GCP. Our service mesh is Istio, and we use various GCP components like load balancers, DNS, KMS, and other services.
Bart: As someone who lives in Spain, I have used BlaBlaCar several times over the last few years. It's a great way to meet people, with plenty of time to talk and a comfortable way to get to places that might be more difficult with other forms of transportation. As you mentioned, it's also more eco-friendly by reducing CO2 impact. So how did you get into Cloud Native?
Fred: I've always been a Linux enthusiast since I was a student. Professionally, I started my cloud journey with OpenStack back in 2014, which was a decade ago. Since then, I've changed through different products, but for the last five years, I've been mostly working with Kubernetes and Istio.
Bart: Very good. We also track content that's trending in the Kubernetes ecosystem. We have a monthly report that comes out where people can see what they're most interested in and what they're having the hardest time learning. You've obviously written articles, but how do you keep updated with the Kubernetes and Cloud Native ecosystem? Where are your go-to places for learning something new?
Fred: It changes over time depending on the subject. Usually, I read articles on platforms like LinkedIn, Medium, and some Substack newsletters. I also follow tech blogs from companies like Cloudflare. A long time ago, I used to follow GitHub and Dropbox tech blogs. They used to have really good content, but I'm not working on that part anymore.
I also follow some individual blogs. For example, for Istio, Jimmy Song has a really good blog with detailed and precise articles, which is a great source of information for me.
Bart: If you could go back in time and share one career tip with your younger self, what would it be?
Fred: That's a tough question because it depends on everyone's personality. I would advise one thing: work on your soft skills. It may sound counter-intuitive because we all work in a tech ecosystem where being more technical seems like the path to recognition. But in my opinion, we are all technically good. There are many skilled tech people around, and what makes the difference are soft skills. Focus on your ability to listen, communicate, share your work, and collaborate with people. That, to me, makes the difference and allows you to improve your career path.
Bart: That's a really good point. If you think about all the courses there are for public speaking—and I've given and continue to give courses on public speaking—listening is so much more than just being quiet when someone else talks. It's understanding where the information is coming from, how to pay attention, and how to ask questions. There are lots of different ways to listen. Has there been anything in your experience that you've done through reading books or other techniques?
Fred: I read books. Or nothing very specific, but listening to people and understanding their issues and how I can help them solve those issues. To me, that's a key thing because sometimes I want to solve things, but it will have no impact on the company or the team. In such cases, it's not worth pursuing, even if the subject could be a really fascinating thing I wanted to work on. Instead, I focus on: How can I help people and make their life better? That's key.
Bart: Starting with empathy. I really like that. So as part of our monthly content discovery, we found an article that you wrote titled, "Warm Up Your Pods Using Istio". We wanted to dive into this topic further. Before looking more closely at the problem you solved, could you give us some context about BlaBlaCar's infrastructure and the scale you're operating at?
Fred: At BlaBlaCar, we operate medium-sized community clusters with around 4,000 pods in production across roughly 100 nodes split into production clusters. We mainly use Java applications as microservices, representing 1,500 services in production, most of which are stateless. We extensively use spot instances and follow GitOps principles, which means we roll out frequently. As soon as you merge something on your branch in your code, a new Docker image is built and deployed to production. When Google reclaims an instance, all pods on it are killed and respawned elsewhere. These two factors make smooth rollouts crucial for us, given the rapidly evolving ecosystem.
Bart: And can I ask a bonus question? Since you mentioned that you're running stateless workloads, have you had any experience running stateful workloads, perhaps with an operator?
Fred: Yes, we've got some stateful workloads like databases and Elastic Kafka. We let them run on a node pool that is not using spot instances, which is more stable. They are only killed when GKE rolls out a new data plane version. Usually, that's fine. Sometimes we have to do some tweaks, but most of the time it's okay. However, they can't land on spot instances as it's really too volatile.
Bart: Java applications are known for needing warm-up time when they start. Can you explain what happens when a new JVM-based pod comes online and why this causes problems?
The key terms I've hyperlinked are:
Java applications (linked to GraalVM)
JVM (linked to JIT compiler documentation)
These links provide additional context about Java applications and the Java Virtual Machine, which are relevant to understanding the warm-up time challenge.
Fred: During the first process, the initial requests are slow because the code is initially interpreted, not recompiled. It's bytecode and by definition, it's slow. Then the Just-In-Time (JIT) compiler kicks in, compiling the code and optimizing it using profiling. This process consumes CPU power, which means the workload doesn't have all the resources it needs to process HTTP requests normally.
Once this initialization is complete, the execution becomes much faster and everything runs smoothly. However, at startup time, the request latency is really bad. This typically takes about one to one and a half minutes, depending on CPU power and node capabilities. We noticed this issue many years ago, but it has not been prioritized, so it has taken us some time to address it.
Bart: And before finding the Istio solution, you tried several approaches that didn't work out. What were some of these attempts and what made them unsuitable?
Fred: We investigated several potential solutions. Technically, projects like GraalVM or CRaC are interesting, but in a Kubernetes context, we found them complicated to set up. CRaC performs snapshots of your Java application and takes a memory image, which would require moving the image to another node or sticking pods to the node where the image lives. This seemed quite hard to maintain and operate at scale.
We explored four potential approaches:
Providing more CPU power: While this could help applications start faster, it's costly and represents a waste of resources. Once reserved by the Kubernetes scheduler, the CPU can't be reused for other applications. This didn't fit our strategy of efficiently packing pods on nodes.
Creating a startup container (init container): This approach had significant drawbacks. It was an additional tool to maintain, consumed more resources, and required developers to declare and update specific endpoints to warm up. Developers typically want to focus on their application, not on warming up infrastructure. Moreover, we could only perform simple GET requests and couldn't warm up endpoints requiring user interaction like POST requests.
Istio's slow start mode (released in Istio 1.18): Initially, this had only one parameter—the duration window time. The pod would receive a fraction of traffic during a set time window, gradually increasing until full traffic. However, even the initial percentage was too high for our service with around 2,000 requests per second, causing slow requests that broke our SLOs.
After investigating the documentation, I discovered that while Istio's API was limited, Envoy (which Istio is based on) offered two additional interesting parameters: the minimum traffic percentage and the ramp-up curve profile to reach 100% traffic. This led us to contribute improvements to the Istio API.
Bart: And what exactly did you add to make the warm-up configuration more flexible?
Fred: Istio has two main repositories. The first one is the API, which is mostly protobuf definitions and CRDs. The second is the code repository itself. I made a first contribution to the API to update it, including parameters from Envoy and deprecating the old ones. Then, I submitted a second pull request to implement the API modifications I made in the API repository.
For anyone interested in contributing, I advise preparing these two pull requests in sequence. As soon as the API is updated, the Istio CI breaks if you don't implement an API test. I experienced this last year when I thought I'd complete the work later. I realized that every contributor in the repo was stuck, and I thought, "My god, I've reached Istio's limits."
Bart: The aggression parameter introduces non-linear traffic scaling. How does this work, and when would teams want to adjust it?
Fred: This is the curve to ramp up traffic from zero to 100%. It could be a line over time, representing traffic on a time basis. The scale could range from polynomial to exponential. We actually stick with a linear curve because we prefer a reverse curve that starts slower and then increases, instead of an exponential one with a high initial curve that smooths out.
Our concern is to have minimum traffic to warm up the pod. The current settings provide this minimal baseline, so we don't use anything else.
Bart: You conducted extensive testing with Fortio before production. What scenarios did you test, and what surprised you in the results?
Fred: I tested all the scenarios in our cluster, which fit with what happened in our production cluster: traffic spikes, rolling out, and pods getting killed at any time. That's why I'm conducting these tests.
Regarding rollout strategies, it wasn't really a discovery for us because we use MaxSurge, MaxAvailable, or MaxUnavailable by default in our deployments. I was curious to see how the warm-up would be with different parameters. It was mostly a curiosity for me, and I had in mind to write an article. I thought it could be interesting for everyone, especially those who don't use these kinds of parameters.
I was also curious to check the relationship with the Horizontal Pod Autoscaler (HPA). When a pod rolls out with a slow ramp-up, the traffic share for this specific pod is split over the other pods, which means the other pods get more requests than usual. At scale, it's okay and doesn't really trigger the HPA—it takes about five minutes for the HPA to trigger on the pod. If you set up your warm-up to two minutes, your pods will have a small CPU consumer farm overload, but it doesn't really trigger. It's a new pod with a new HPA that will also have a small spike. To keep things stable, keep your warm-up under your HPA settings.
Bart: Speaking of deployment strategies, you found that Kubernetes rollout settings significantly impact warm-up effectiveness. What's the relationship there?
Fred: The relationship without proper strategies means all pods are killed at the same time, which prevents accurate traffic share calculation because no pods remain in the cluster. This approach doesn't work at all, as all pods would receive the same amount of traffic from the start—exactly what we don't want during a warm-up. It's crucial to have a rolling out strategy with either max unavailable or max surge parameters. The strategy is mandatory to have a proper working warm-up configuration; otherwise, it's worthless.
Bart: So when you deployed this to production, what kind of improvements did you see in your service-level objectives (SLOs)?
Fred: It was really good, especially when we conducted the first test on a specific endpoint. I didn't mention it in the article or be so specific. This endpoint is a moderation service where, if the service doesn't answer fast enough, the message goes through and is delivered—which is not what we want. We want to moderate all messages and protect our users from scammers.
Before the warm-up, without sharing all details, a lot of messages could go through and be delivered without going through the moderation service. I don't mean they were all scams, but we knew many messages weren't moderated. After this increment, only a really small fraction of messages are delivered and moderated. The improvement was huge technically, and for our users, it's significantly better at BlaBlaCar.
Bart: And you mentioned this doesn't completely eliminate slow requests. What are the remaining limitations, and why is this solution still worthwhile?
Fred: The very first requests are still slow because even if the pod is cold and doesn't receive as much traffic as before, it still receives a share. The very first requests are slow, and this is acknowledged by our product managers. As long as we meet our SLAs and SLOs, that's fine. If we don't, we'll find another solution. Everyone in the company is aware that we are not 100% perfect in this area. The objective of SLOs is to find the right balance between being valuable and effective with a reasonable cost and reasonable toil, which is acceptable for us right now.
Bart: And for teams wanting to implement Istio warm-up, how should they approach setting the three parameters for their specific services?
Fred: It's very simple because it's only a couple of parameters. The first one is to define the duration window: check your Horizontal Pod Autoscaler (HPA) first and pick a number just below. Between one and two minutes should be okay. Regarding the curve, we stick with the linear one, which gives good results. But if people want to iterate, they could start with this one and then try other curves to see if they perform better.
Bart: This seems like a relatively simple configuration change that had a massive impact. What made this solution particularly elegant compared to other approaches?
Fred: It's simple, and sometimes simple solutions are really nice. It's like any tool where we have trade-offs to make. The previous approach wasn't so great because it required code to maintain, wasted resources, and developers had to configure something. This solution just works out of the box with only a couple of parameters. The defaults are really good for everyone. We don't tweak any particular services regarding this warm-up. We have it by default for every Java application at BlaBlaCar with the same settings. So you have no maintenance, no code to maintain, and people tend to forget it at the end. That's perfect for infrastructure professionals like me. It's really cool when developers just forget they're running on the service mesh.
Bart: Looking back at this journey, what advice would you give to other teams that are facing similar challenges with service warm-up during deployments?
Fred: I advise that when facing such issues, it's crucial to qualify the problem and remember that a service mesh could potentially be the best candidate to solve these challenges. This was the problem we encountered with our warm-up solution to an infrastructure issue. It's part of the code's role and target. Picture the world problem, discuss possible solutions with different people in the company—developers, DevOps, and infrastructure engineers—and then try to determine the right solution and target approach.
Bart: Now, for folks watching, you'll see that Fred has a guitar behind him. If you're listening, you won't see it. We think about the problem of learning guitar. Tell me, walk me through your experience of when you started playing guitar and the challenges you faced.
Fred: The main problem was to have a good knowledge of the stack. Without a comprehensive understanding, you can't improvise or be autonomous. Your brain needs to let your fingers follow the music naturally. When you lack a comprehensive stack, you have to think about techniques instead of focusing on the music.
Bart: I noticed that the transcript you provided seems to be a placeholder or an error. There's no actual transcript content to analyze or hyperlink. Could you please provide the full, correct transcript from the audio file? Without the actual transcript, I cannot perform the requested task of identifying and hyperlinking relevant terms.
Fred: About 20 years ago
Bart: What was that process like? Did you take lessons? Did you study in a more formal way? Or how was your learning process?
Fred: First, I started on my own. Then I took lessons when I was a student, studying jazz guitar and music theory. I stopped for 10 years because I had kids and was focused on something else. Now my kids are older, so I have more time. In the last six years, I started taking lessons in music schools and playing with bands around the place.
What kind of music are you playing in your band? I used to play funk for a couple of years—like James Brown, Maceo Parker, Tower of Power. It's really hard to play, especially with your arms. I did that for three years, and now I'm more focused on composing music with my band. We compose rock and pop rock music.
If you want to check out our music, we've recorded some stuff on SoundCloud, but I think it's private. I'll have to ask my bandmates if we can share it. We have three tracks.
Bart: If you decide to open-source some of your music and share it so we can listen and enjoy it, that'd be great. I think it's really cool seeing the parallels in the learning journey, getting resources, and troubleshooting aspects. In my case, I started learning on my own and then took lessons, but mostly I've done things independently.
I agree that if you don't map out the neck and know where all the notes are, it'll be difficult to play the notes you want. Guitar has taught me that making mistakes is not something personal, but an opportunity. Often, you'll make a mistake and realize, "I really like that mistake, I'm going to keep making it" — now you can sound more like Catfish Collins or whatever guitarist you want to emulate.
There's a lot you can learn from this approach, and it can be applied just about anywhere. Now my last question is: why a Les Paul and not a Fender Stratocaster?
Fred: I noticed that the transcript you provided seems to be a placeholder or an incomplete fragment. Could you share the full transcript so I can properly analyze and hyperlink relevant terms?
Without the complete context, I cannot confidently apply hyperlinks or make meaningful edits. I would need the full transcript to:
Identify technical terms
Match those terms with the provided links
Create appropriate markdown hyperlinks
Could you provide the complete transcript from the audio file?
Bart: That's good.
Fred: It's just on the other side of the camera.
Bart: Okay, that's good to know. So you appreciate different guitars that provide different sounds. A couple of our listeners, Eamon and Orlin, are also really into playing guitar, and we often talk about it. It's a pleasure to discuss this in a podcast recording. Well, Fred, what's next for you? Are you going to be writing more articles? Are you going to be exploring more elements of Istio? What do you want to do?
Fred: Regarding Istio, I would like to work more on multi-mesh settings. We currently have several meshes in different clusters, always with a multi-cluster approach. The security between meshes is not ideal. I want to improve this with more request authentication, authorization policies, and east-west gateways between meshes. This could smooth the communication between clusters and improve security. I will work on this in the coming months.
Bart: Fantastic. How can people get in touch with you?
Fred: LinkedIn, or by email at frederic.gaudet@BlaBlaCar.com.
Bart: That's it.
Fred: It's simple.
Bart: LinkedIn worked for us, so I could definitely speak to that. Fred, thanks so much for sharing your time and knowledge with us. I really enjoyed speaking to you and look forward to talking to you in the future. Take care.
Fred: Thank you, everyone.