AI-driven testing, GitOps strategies, and configuration management

AI-driven testing, GitOps strategies, and configuration management

Guest:

  • Ole Lensmar

In this interview, Ole Lensmar, CTO at Testkube, discusses:

  • AI-powered testing workflows - How artificial intelligence is being used to generate test scaffolds, analyze failure patterns across distributed Kubernetes applications, and help remediate issues.

  • Performance testing strategies beyond staging - A practical approach that combines soak testing, spike testing, and parallel test execution.

  • Testing in production as standard practice - The surprising revelation that 80-90% of professionals actually test in production despite academic recommendations against it.

Relevant links
Transcription

Bart: So, first things first: who are you, what's your role, and where do you work?

No specific hyperlinks were needed for this transcript segment, as it is a generic introductory question without technical terms or specific references from the provided links.

Ole: I'm Ole Lensmar, CTO at Testkube, working from my home in Stockholm, Sweden.

Bart: Fantastic. What are three Kubernetes emerging tools that you're keeping an eye on?

Ole: Three Kubernetes emerging tools: I'm keen on Kubernetes itself, because the product we build is very low-level and integrated with Kubernetes. I'm keeping an eye on AI-related tooling in the space, because I think that's going to move things forward, however skeptical you might be. I also find interesting tools around configuration management like Helm and Kustomize, and ephemeral environment tooling, which is super interesting and very adjacent to testing. Actually, that's four tools, so I apologize for the discrepancy.

Bart: I notice the transcript snippet is very short and lacks context about the specific AI testing tools Ole might be discussing. Without more context, I'll need to be cautious about adding hyperlinks. Could you provide more of the surrounding conversation to help me accurately link relevant terms?

Ole: Testing and AI comes with a lot of flavors. Many initial approaches are using AI to generate tests, which makes sense because tests are code. Most modern test tooling is very code-oriented. You can test code using Playwright, Cypress, writing unit tests, or K6, which is essentially scripting. This is one way people are creating more tests because getting initial test scaffolds running is easier with AI. Obviously, someone still needs to add nuance and ensure the right things are being tested.

Other interesting areas include helping schedule the right tests and remediate failed tests. An AI can help figure out why a test failed by examining logs, code changes, and configuration in your Kubernetes clusters. It can deduce the cause of test failures, which can be cumbersome, especially in Kubernetes, where you might have distributed applications spread out across different systems. There are often hidden details in APM solutions or traces, and AI is great at reading through these and helping narrow down potential issues.

These are the initial areas where people are approaching AI in testing, but it's still an early field that's exciting to watch develop.

Bart: One of our podcast guests, Steven, believes you can't really expect that testing your application in a dev or QA environment prepares you for production, because production is always different. What's your approach to performance testing?

Ole: Performance testing is a vast field. My approach is to conduct testing early in the life cycle, understanding that the infrastructure for pre-production testing differs from production. In production, you'll encounter different capacity, security constraints, and potential throttling limits from third-party tools.

While pre-production testing can be challenging, it's still crucial. Soak testing, for instance, involves applying continuous load over extended periods, not necessarily at massive levels. It's important to run performance tests simultaneously with other tests like security, functional, compliance, and acceptance tests.

Try running a basic load (e.g., 100 users) while simultaneously conducting other tests. This approach can reveal how load impacts system behavior, potentially affecting functionality and security. You can also combine multiple load testing strategies, such as:

  • Soak testing with basic continuous load

  • Spike testing, where you suddenly introduce a high load (like 1,000 users for 10 seconds) and observe system recovery

The key is to experiment with different load profiles and run tests in parallel. Remember, successful performance in staging doesn't guarantee the same performance in production, which typically has more constraints and complexities.

Bart: Another guest of ours, Andrei, suggested that FluxCD is more approachable to administrators, whereas Argo CD is more about interacting with developers. Do you agree? And what's your advice when it comes to implementing GitOps?

Ole: FluxCD seems more like a toolbox for building GitOps workflows, whereas ArgoCD is a more opinionated approach to GitOps. I don't mean that in a bad way. ArgoCD can appeal to people wanting to quickly get started with GitOps, while someone more experienced might want to fine-tune exactly how things will play out in a more complex GitOps deployment in a complex organization.

What's important with GitOps is understanding what you're getting into and the consequences for your development and deployment pipelines, and for your organization. This requires a mindset shift, making people think differently about deployment frequency.

With my background, I'm also considering how testing fits in. GitOps often involves an asynchronous deployment pipeline, versus the previously more synchronous monolithic CICD approach. How do you tie test execution into this efficiently?

Even if you're an ops person who might think testing isn't your problem, you'll want to validate every GitOps sync. This could involve infrastructure testing, load testing, or functional testing. You'll need to determine where to inject these tests without slowing down or overloading the process.

I recently spoke with a customer who had a large GitHub repo syncing, and they were kicking off tests for every small sync—which they didn't want. They sought a more granular approach. There are definitely considerations around the consequences of adopting GitOps for your deployment pipeline.

Bart: On the subject of GitOps, because it's part and parcel of this, in our recent insights report we saw that configuration management was the number one area practitioners engage with. From a technical perspective, why do you think configuration management stands out so much in Kubernetes? And what challenges do teams face when trying to keep configuration reliable and consistent across environments?

Ole: The challenge with configuration managers is that it's so easy to spin up infrastructure. How do you keep that consistent? How do you manage consistency across clusters, across nodes, across everything you're deploying? How do you manage the lifecycle of configuration together with the applications running in your infrastructure?

I can definitely see that this becomes a challenge as you grow, as you start generating ephemeral environments for testing, and as you build your platform engineering team, which provisions temporary infrastructure for your teams. Suddenly, you end up with multiple clusters running different versions of Kubernetes because Kubernetesevolves quickly, and customers can stay on older versions for a long time. Over time, this leads to configuration drift.

Configuration management is not new for Kubernetes. It's a practice that has been around as long as the software development lifecycle—30, 40 years ago—just with different types of configuration and similar challenges. The introduction of GitOps, which uses Git as the source of truth, adds an interesting dimension to this.

However, people are still trying to figure out the best practices: How do we prevent configuration drift? When configuration drift happens, it can be really painful to unwind.

Bart: Kubernetes turned 10 years old last year. What should we expect in the next 10 years to come?

Ole: I think Kubernetes has been great at adapting to user needs. My expectation is that it continues to do so pretty aggressively. At a higher level, I'm sure it's going to continue following industry trends, which I think is great. It's a somewhat high-level generic prediction. I don't have any specific predictions on CRDs or resource types, but I think at a strategic level, it's been great to see Kubernetes evolution. We're still in a space that is rapidly changing. People are exploring new trends like GitOps, and now AI, and I'm sure we'll have something else new and shiny in the next two or three years. I'm expecting communities to keep up, and I'm excited to see how that unfolds.

Bart: Fantastic. Like you said, predictions can be tricky, but you do know one thing for sure: What's going to be happening in KubeCon?

Ole: KubeCon is going to be awesome, and we are organizing KubeJam. I have been practicing heavily on my guitar to get my tapping techniques into place. I hope to do an Van Halen-style solo this time around with heavy distortion and hopefully you on the stage as well, Bart. That would be awesome.

Bart: I hope to be there. We will channel all the Van Halen energy possible in Atlanta with the other wonderful folks at KubeJam. Definitely check it out. We'll be sharing more news about that on Kube Events. Are there any other projects you have going on that we should know about in the next few months?

Ole: The most interesting thing in the Testkube world is that we've just launched our MCP server. We're seeing some amazing workflows from customers, especially around remediation. We're doing a webinar soon to show how that works. I'm super excited about it because I think it'll add a lot of value to our offering and make life much easier for our users.

Bart: I understand you have a podcast

Ole: We have a Cloud Native Testing Podcast, we have about 15 episodes now, and it's been really great. I'll share one super interesting thing: testing in production is something most people consider a no-no, but about 80-90% of the people we've had on the podcast actually do testing in production and see it as key to their testing strategy. They believe you need to make sure things don't fall between the cracks, and it's a great complement to pre-production testing.

It was fascinating to hear from these professionals. While the academic world might say you should not test in production, in real life, that's actually what most people do, and it works perfectly fine.

Bart: Ole, if people want to get in touch with you, what's the best way to do that?

Ole: You can contact me on LinkedIn or email me at ole@testkube.io. I'm very open to chatting.

Bart: Ole, it's always a pleasure to talk to you. Looking forward to seeing you very soon at KubeCon. Best of luck with the podcast, and we'll be in touch. Take care.

Ole: Take care.

Podcast episodes mentioned in this interview