The road to hyperscale: solving Kubernetes' database and cost challenges

Guest:

Matthew LeRay

In this interview, Matthew LeRay, Co-founder and CTO of Speedscale, discusses:

The evolution of observability and monitoring in cloud environments, highlighting how the separation of computing and storage has enabled comprehensive data collection and analysis.
How development environments account for nearly half of cloud costs, making them a prime target for cost optimization through tools.
Why Kubernetes is projected to grow from 5% to dominant market share in the next decade, driven by the need to consolidate infrastructure management and solve persistent challenges like database management in containerized environments.

Relevant links

Transcription

Bart: Who are you? What's your role? And where do you work?

Matt: My name is Matt LeRay. I'm co-founder and CTO of Speedscale, based out of Atlanta, Georgia, and we solve the problem of environment replication.

Bart: What are three emerging Kubernetes tools that you are keeping an eye on?

Matt: I'd say there are a few trends that I'm excited about. The first one is eBPF, which is a way of looking inside the Linux kernel to inspect traffic and other things. I'm really excited about eBPF for security purposes. Istio and Envoy are moving to an eBPF-based sidecar, which I think will revamp the way we do sidecars and make it more available and easier.

The second trend I find interesting is Virtual Clusters. One of the big trends in Kubernetes over the coming five years will be trying to hit hyperscale, handling the biggest production workloads. The Virtual Clusters and API Gateway developments are also interesting.

Lastly, I'd like to see more activity around handling databases and long-term storage on Kubernetes. I think there will be a lot of development and interesting trends in this area. Currently, you still can't put a database on Kubernetes and expect good things to happen.

Bart: Now, responding to some points that were made by guests on our podcast, when it comes to over-provisioning, Alexander wrote an article about over-provisioning. What strategies have you found effective in controlling over-provisioning in large Kubernetes clusters?

Matt: So, over-provisioning - we tend to focus on the big picture. At my company, Speedscale, we try to eliminate over-provisioning for development purposes. This may not sound like a big deal, but according to the most recent DORA report, 45% or more of the cost spent on cloud is actually for quick, short-lived development environments. For example, someone testing something or having a full test environment. We find that targeting half of that spend is one of the biggest ways to fight over-provisioning in Kubernetes platforms. Another way to address this is by using centralized cloud systems like SpectroCloud, which provides centralized control within a platform team and helps reduce waste, thereby reducing over-provisioning.

Bart: In terms of Observability and Monitoring, Miguel explained that while Monitoring deals with problems that we can anticipate, for example, a disk running out of space, Observability goes beyond that and addresses questions you didn't even know you needed to ask. Does this statement match your experience in adopting Observability in your stack?

Matt: Yes and no. I did observability for about 20 years, and the big shift I saw happening when people moved from Observability to Monitoring was a core technology change in the monitoring space. For most of my career in Observability, we were limited in what we could collect because storage and data transmission were very expensive. However, with many new monitoring and Observability tools, they can ingest almost anything. This comes from the cloud data warehouse philosophy of tools like Snowflake and Databricks, which have separated compute from storage. I agree with the original statement that it can help you find things you weren't looking for, as that's what we were always trying to do with Monitoring and Observability systems. The ability to store enormous amounts of data really does constitute a paradigm shift. For us, it encourages logging everything, emitting events, and generating as much data as possible from our app. This approach gets more out of Observability, whereas with Monitoring, the focus was on keeping a tight set of data.

Bart: Cattle versus pets, infrastructure as code. Dan Garfield shared that your cluster only feels real once you set up ingress and DNS. Before that, it was just a playground where it didn't matter when stuff broke. What are your thoughts on this?

Matt: Everybody who works in technology knows that they must have no problems with that cluster because the problem is always DNS. What's being expressed here is something we see a lot in large enterprises: Kubernetes is actually being used in production now. As one of my customers jokes, "We use Istio in anger because it's the only way to use Istio." What we're seeing now is that many folks are actually running the real deal, but that's also creating a problem for adoption. In enterprises, platform engineering teams are emerging, bringing ideas like Kubernetes, Cattle vs Pets, and more. They're slowly trying to get into these organizations, building a Kubernetes platform. We've seen this all over the place. Then, it's like, "Let's turn on the DNS. Let's actually get some traffic running through that." And that's creating a huge problem. We went through this before in the virtual machine era, when everybody was switching to things like VMware. It's going to take some time, but I think it's part of the normal growing pains. I don't think there's anything stopping the Kubernetes trend at this point.

Bart: Regarding the Kubernetes trend, so Kubernetes turned 10 years old this year. What should we expect in the next 10 years to come?

Matt: So right now, I don't know the exact market share percentages, but Kubernetes is probably 5% of production apps. However, I think that will multiply a hundredfold. The reason is organizational as much as it is technical. Organizations struggle with having to have a networking expert, a server expert, hardware experts, and handle power, air conditioning, and other disciplines. We've simplified that with the cloud, but not everything has been integrated into something that Kubernetes can manage. I think we'll see people trying to find solutions to the database problem in Kubernetes for the next 10 years, getting that last piece inside. The reason is that organizations don't want to have extra DevOps folks with specialty knowledge; they'd like to consolidate everything into one set of people. With Kubernetes, you can manage software, software-defined networking, containers, and more, all with one set of skills. I think organizations will do that, and 10 years from now, someone will probably have solved the data problem on Kubernetes, making it the default cloud platform and making cloud usable for many organizations.

What's next for me? At Speedscale, I'm a co-founder, and we do something called environment replication, along with a focus on cost cutting. We're focused on solving the problem of moving data out of production into pre-production environments for testing and cost cutting, like reducing cloud bills. That's what I'll be focused on for the next few years.

Bart: How can people get in touch with you?

Matt: You can reach me on LinkedIn, Twitter, or via email. However, the best way to get in touch is to visit the Speedscale webpage and use the web forms. We'll be in touch.

Podcast episodes mentioned in this interview

Clusters are cattle until you deploy ingress
with Dan Garfield
The basics of observing Kubernetes: a bird-watcher's perspective
with Miguel Luna
Configuring requests & limits with the HPA at scale
with Alexandre Souza