On-prem challenges, Service Mesh insights, and the role of AI

On-prem challenges, Service Mesh insights, and the role of AI

Guest:

  • Thomas Graf

Discover the insights of Thomas Graf, VP & CTO Cloud Networking & Security at Isovalent, as he explains the complexities and future of Kubernetes, both on-premises and in the cloud.

In this interview, Thomas will discuss:

  • Why managing on-premises infrastructure presents more challenges than the cloud and how emerging tools are starting to close this gap.

  • The trade-offs between using sidecar or sidecar-less architectures in service meshes.

  • The role of AI in automating Kubernetes operations and increasing sophistication in automation.

Relevant links
Transcription

Bart: Who are you, what's your role, and who do you work for?

Thomas: Hey, my name is Thomas Graf. I'm the CTO and co-founder of Isovalent, and I was also an original project creator of Cilium.

Bart: What are three Kubernetes emerging tools that you're keeping an eye on?

Thomas: Of course, Cilium and Tetragon are obvious choices. Those are obvious. I'm also having a close look at Dagger, which I think is very interesting. In general, I'm observing and monitoring all tools around on-prem Kubernetes use because I see Kubernetes being increasingly applicable to on-prem. And clearly, there is a gap in tooling that needs to be solved.

Bart: What do you think is the primary reason that's driving people to on-prem?

Thomas: That's an interesting question. Part of it is definitely cost. Customers and users have had bad experiences in terms of cloud cost, but then also some customers have never managed to move to public cloud, so they're still on-prem and they need to bring Kubernetes to them. So it's a little bit of both, but it's usually one of these two reasons.

Bart: William Morgan said that eBPF is bad for service meshes, alluding to the idea that the technology doesn't work well for this tool. What are your thoughts on using service meshes, and what kind of features do you evaluate service meshes for?

Thomas: So I think whether you need a service mesh or not is a decision everyone should make. A service mesh includes many different things: encryption, authentication, resilience, load balancing. Not everybody will need a full-blown service mesh. Some people will be fine with OpenTelemetry, Cilium as a CNI, or just an API gateway. Whether you want a service mesh and what type of service mesh you want is up to you. I think eBPF is a great building block for solving many of those issues. It's not a silver bullet, and you shouldn't expect to solve everything with eBPF. You don't have to solve everything with eBPF, but it does have benefits in terms of performance, latency, and reduced overhead. So whenever we see a need or a use for eBPF to lower the overhead and improve performance, we definitely make use of it. But Cilium Service Mesh is also not dependent on implementing everything with eBPF. If something is not doable with eBPF, we go back to Envoy and take the regular proxy route.

Bart: William explained that ambient mesh is a viable alternative to having too many sidecar containers and service meshes, but it comes at the expense of not having the pod as a single independent unit. What are your thoughts on service meshes? Should you use one or not? And when should you use one?

Thomas: I think this multi-tenancy and whether to sidecar or not to sidecar question keeps coming up. And I think the wrong answer is to always demand and require sidecars. But there's also not a single right answer. Maybe a cycle-less architecture as Ambient or Cilium Mesh does it, is useful and valuable for many people. It doesn't have to be the only answer. The argument on multi-tenancy, I think I've heard that over and over again when containers came around. Believers or like when we had VMs, the statement was also made, hey, containers will fundamentally destroy the multi-tenancy guarantees that VMs have introduced because we're now sharing a single operating system. And the trade-off is exactly the same with sidecar-less architectures. You're now sharing a proxy, whether this is per namespace or per node. And this has benefits, like in terms of resource utilization, but it also has a couple of downsides. In the end, this is a trade-off. And you should look at what you need. And based on what your needs are, you should select either a sidecar or a sidecar-less service mesh.

Bart: Matthias believes that on-premise deployments require proper education and attention, especially regarding managing on-premise architecture versus cloud architecture. After spending a few months building an on-premise Kubernetes cluster, he shared this advice. What's your experience with bare metal clusters? And how does that differ from using Kubernetes in the cloud? What would you have liked to know before starting Kubernetes on bare metal?

Thomas: So I think bare metal is definitely harder. The reason why it is harder is because you have fewer abstractions already at your disposal. If you're in the cloud, you already have a harder abstraction layer provided by the cloud provider. You have an API to deploy load balancers, addressing. VPCs, virtual networks, on-prem, you need to build all of this yourself with tools. I think people coming from that data center on-prem world, they're still used to, let's say, legacy vendors, right? Like doing networking, doing storage in a way that we have been doing for the last 25, 30 years. They're not cloud native yet. But there are many emerging tools, including Cilium, that help you hide that abstraction or hide that complexity on the on-prem side and give you like a true cloud native feeling. while still implementing the traditional enterprise requirements that you have on the on-prem side. So I think it's a lack of abstraction that makes on-prem harder right now. And as long as these abstractions are not present, we of course have to fill that gap with learning, with actually understanding BGP and all of these concepts needed for on-prem when the tooling will be ready or once we have closed that gap. I think the barrier will be much, much lower.

Bart: Kubernetes is turning 10 years old this year. What should we expect in the next 10 years to come?

Thomas: I don't know. I think we'll probably have AI, like chat GPT based. Like prompt-based Kubernetes engineering in some way. But I think the more interesting question is, what's next with Kubernetes? Will Kubernetes just have disappeared into the layers, as we have many such layers in our infrastructure sandwich so far? So I would say most likely there will be some level of automation on top of Kubernetes where AI may play a role, whether this is for security, troubleshooting, or automation. We'll probably have more automation in terms of how to operate this infrastructure. Maybe not in Kubernetes itself, because I think Kubernetes is getting boring for a good reason, but actually as a layer on top of Kubernetes.

Bart: What's next for you?

Thomas: We're joining Cisco, so that will happen in a couple of weeks, and we will continue helping Cisco build cloud-native networking. We look forward to bringing Cilium and Tetragon and our tools to the broad Cisco customer base, which will definitely be a major next milestone for our company and for the customers we have so far. And that will be an exciting journey.

Bart: How can people get in touch with you?

Thomas: You can find me on Twitter, LinkedIn, Slack. Slack is probably the easiest way to reach me. Kubernetes, CNCF, Cilium Slack. You can easily find me as Thomas Graf and just DM me.

Podcast episodes mentioned in this interview