A Journey Through Kafkian SplitDNS in a Multitenant Kubernetes
Dec 2, 2025
Host:
- Bart Farrell
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
Fabián Sellés Rosa, Tech Lead of the Runtime team at Adevinta, walks through a real engineering investigation that started with a simple request: allowing tenants to use third-party Kafka services. What seemed straightforward turned into a complex DNS resolution problem that required testing seven different approaches before a working solution was found.
You will learn:
Why Kafka's multi-step DNS resolution creates unique challenges in multi-tenant environments, where bootstrap servers and dynamic broker lists complicate standard DNS approaches
The iterative debugging process from Route 53 split DNS through Kubernetes native pod DNS config, custom DNS servers, Kafka proxies, and CoreDNS solutions
How to implement the final solution using node-local DNS and CoreDNS templating with practical details including ndots configuration and Kyverno automation
Platform engineering evaluation criteria for assessing solutions based on maintainability, self-service capability, and evolvability in multi-tenant environments
Relevant links
Transcription
Bart: In this episode of KubeFM, we're joined by Fabián Sellés Rosa, a platform engineer at Adevinta and one of the people responsible for running SHIP, their internal multi-tenant Kubernetes platform that powers dozens of teams across more than 20 clusters and 2,000 nodes.
In this episode, we go deep into the realities of platform engineering at scale, exploring:
What it means to run a multi-tenant cluster with independent teams deploying workloads with very different requirements
How even simple services can cause unpredictable failures in shared environments
How one specific issue—Kafka plus DNS resolution in a multi-tenant Kubernetes setup—turned into a full engineering investigation
Fabián walks us through the entire journey of debugging this problem. We'll look at why Kafka's multi-step DNS resolution behaves differently for different tenants, how SHIP's infrastructure, networking model, and DNS hierarchy interacted with Kafka in surprising ways, the seven different approaches they tested, and what failed, why it failed, and what they learned along the way.
Last but not least, we'll explore how they finally arrived at a solution using node-local DNS and CoreDNS templating, plus a clean self-serve interface for tenants.
This is the kind of conversation we love at KubeFM: real failures, real debugging, real engineering trade-offs, and a behind-the-scenes look at how a large platform team actually solves problems. If you're working on a platform team dealing with DNS, Kafka, multi-cluster setups, or just want to hear a brutally honest engineering story, this episode is definitely for you.
This episode of KubeFM is brought to you by LearnKube. Since 2017, LearnKube has provided Kubernetes trainings for engineers worldwide. Courses are instructor-led and are 60% practical and 40% theoretical, given both in-person and online to groups and individuals. LearnKube students have access to course materials for life. If you want more information about how you can level up, check out LearnKube.com.
Now, let's get into the episode with Fabián. Fabián, welcome to KubeFM. What are three emerging Kubernetes tools that you're keeping an eye on?
Fabián: Right now, I'm looking into KRO (Kubernetes Resource Orchestrator), which I think has a lot of potential to create abstractions, and for a platform team, it's super interesting. I'm also deeply interested in eBPF observability, which is why I'm exploring Microsoft Retina. There are other eBPF observability tools like Hubble, but Retina has interesting features, such as not requiring deployment of the entire CNI, which can be useful.
vCluster is becoming a mature technology, though it's still evolving. I'm keen to understand what people are doing with vClusters and their production readiness. I'm also closely following the Gateway API. After recent news from Nginx Ingress, it has become mainstream, but I've been watching its development for a long time.
Lastly, Dynamic Resource Allocation (DRA) is fascinating in a post-AI, post-ML world. I think it's crucial to keep an eye on how we can better schedule these types of resources.
Bart: So for people who don't know you, Fabián, can you tell us a little bit about what you do and where you work at Adevinta?
Fabián: I'm a platform engineer at Adevinta and the tech lead of the runtime team. The runtime team is responsible for managing a platform as a service like Kubernetes Base in Adevinta, named SHIP. You may have heard about it from previous episodes. SHIP is a multi-tenant, multi-region offering with more than 20 clusters across four regions and around 2,000 nodes worldwide. This is what I do, and I'm very happy to be here because many of my current and former colleagues, like Thibaut, Tanat, Zain, and Miguel, have been guests before.
Bart: And how did you get into Cloud Native?
Fabián: I started to get into Cloud Native at Adevinta about 10 years ago when we were building platforms. At the beginning, we were developing a Mesos cluster, which was more tailored to data jobs and machine learning. However, it was very hard to deploy services on top of Mesos. If you remember those days, you had Marathon, which was an Apache Mesos framework. At that moment, we were thinking about how to deploy services better. That's when my team was tasked with deciding whether to switch to Kubernetes, and that's how my Cloud Native journey began.
Bart: And Fabián, what were you doing before getting into Cloud Native?
Fabián: I've always been an SRE, DevSecOps, sysadmin—call it what you like—in both small startups and large organizations.
Bart: Okay. It's no secret that the Kubernetes ecosystem moves quickly. What works best for you to stay up to date in terms of resources?
Fabián: For the ecosystem and tools, I keep an eye on social media platforms like LinkedIn, Kubernetes subreddits, and KubeFM. For Kubernetes itself, I tend to follow the official Kubernetes blog and skim through Kubernetes enhancements repository. This repo shows the KEPs (Kubernetes Enhancement Proposals) being proposed, which helps me understand what changes are coming to Kubernetes.
Bart: If you could go back in time and share one career tip with your younger self, what would it be?
Note: In this transcript, I noticed the mention of Adevinta, which is the company where Fabián Sellés Rosa works. However, no specific technical terms or concepts were present that require additional hyperlinks.
Fabián: I recommend two things to myself. First, write more, but don't overthink it. I often spend too much time considering the best way to write an article or explain ideas. Writing is one of the best skills you can have. It helps you structure your thinking, share your learnings, and offload your brain, which is very beneficial.
The second learning I'll share is to do and plan less. Experimenting with things is far better than discussing them. Sometimes, writing code will end the discussion.
Bart: And as part of our monthly content discovery, we found an article you wrote titled "A Journey Through Kafka and Split DNS: A Multitenant Kubernetes Offering" (no specific link available). Today, we're diving into a fascinating DNS challenge in a multitenant Kubernetes environment. Can you start by giving us some context about SHIP and what prompted this entire journey?
Fabián: SHIP is a multi-tenant platform. Inside the platform, we run different marketplaces. Every marketplace has different workloads and needs, with different technologies. In SHIP, we need to support that variety: HTTP APIs, default workloads, Kafka consumers, and producers. Since our classes are multi-tenant, we need to ensure that when we make a change, we don't break other tenants' workloads.
One day, one of our brands approached us wanting to switch to a new Kafka provider. They asked if we could make changes to accommodate this new provider. The change was driven by business strategy, costs, and other factors. This scenario is common in SHIP: a tenant requests a specific change. As a platform team, we then need to decide whether to integrate the change into our platform offering or create a temporary solution to enable it.
Bart: And you mentioned tenants wanting to use a third-party Kafka service. What specific technical requirements did this create, particularly around DNS?
Fabián: These managed Kafka services will maintain their own brokers and Kafka servers in their infrastructure, and we need to connect to those. In terms of DNS, to connect to the provider infrastructure, you need to create a private link. The private link serves two purposes: keeping data secure and affordable, as traffic costs, and the provider uses it as an authentication method. With that private link targeting the provider infrastructure, you can effectively access the data. In terms of DNS, the goal is that Kafka clients can access the provider infrastructure without knowing the exact details of where they're connecting.
Bart: So for those less familiar with Kafka's architecture, why is DNS so critical for Kafka connectivity specifically? What makes this different from other services?
Fabián: Kafka DNS resolution is quite peculiar. Unlike other clients where you use DNS to translate a hostname to an IP, in Kafka, you first fetch a bootstrap server. That bootstrap server provides a list of brokers that you also need to resolve to connect to the data.
This means DNS in Kafka is more complex. You may resolve the bootstrap server address, but if you cannot resolve the DNS for the brokers, you cannot connect to the data. This can lead to outages, delays, and connectivity problems. In our multi-tenant and multi-cluster environment, a single Kafka client needs to resolve connectivity using a managed provider, or alternative methods if not using one.
Bart: So Fabián, one of your key constraints was the multi-tenant nature of SHIP. How did this multi-tenancy aspect complicate finding a solution?
Fabián: In Shape, we decided very early that we wanted to be a multi-tenant offering. Being multi-tenant creates many challenges. One of them is that when making changes, you cannot break other tenants. This is the core rule: you need to fix the problem to serve future and new tenants. You also don't want to increase the platform's operational burden or the load on your team. This makes things more challenging in a multi-tenant setup. In a single-tenant environment, it might be easier—you just need to change that cluster or piece of architecture, and it's contained. When it's multi-tenant, you need to be more cautious.
Bart: Now, let's walk through your journey of trial and error. Your first attempt was using Route 53 split DNS. Why did this approach seem promising initially, and what made it fail?
Fabián: This solution is probably one of the simplest you can think of. When considering split DNS, you can look at what your cloud provider offers. In our case, we are hosted in AWS, so we use Route 53, which allows split DNS. We initially thought about implementing split DNS, but it wouldn't work because we early on decided to share a single VPC to host multiple clusters.
We deploy our nodes in private subnets for security reasons, ensuring that nodes are not exposed to the public network. When nodes or pods need to connect to the internet, they use a managed network provided by AWS. This is a pretty standard setup that many organizations use.
However, using NAT gateways means they have associated public IPs. These IPs have been listed in third-party services and internal services, making it challenging to modify the NAT gateway IPs.
Since we have multiple clusters in a single VPC—some with Kafka clients using a managed Kafka service and some without—implementing split DNS in the VPC would serve the feature for some tenants while potentially breaking it for others. That's why we cannot use this solution.
Bart: Now, your second attempt involved using Kubernetes Native Pod DNS config sounds like it should have been a good fit since it's built into Kubernetes. What were the limitations here?
Fabián: We are a Kubernetes-native team that has been operating Kubernetes for a long time. We prefer to do everything at the Kubernetes level. When looking into the pod spec, we discovered the DNS config and host aliases fields.
Host aliases allows you to add a list of hosts and resolve them manually and statically to a given IP. However, this approach becomes problematic with Kafka, where service brokers frequently change. As you scale up or down, you would need to continuously map static IPs and hosts, which means potentially overwriting the hosts file with the list of brokers.
This method works temporarily until the broker configuration changes, which then breaks connectivity and forces the tenant to redeploy the pod to regain connection—an unideal scenario. Our next attempt was to point to a custom DNS server where we could have more control and map domain IPs to the correct target.
Bart: Okay. Now, attempts three and four both involved creating intermediary services, a Kafka proxy and a custom DNS resolver. These sound architecturally different, but you grouped them together as failures. What was the most common problem?
Fabián: We were looking into the DNS config field because we described the host aliases, which were pretty static. We thought DNS config could be useful. In DNS config, you can pass a list of name servers that the pod will use to do resolutions and translate addresses to hosts. That sounds promising and could fix the issue. However, the platform would need to maintain that custom DNS server, either by the platform team or other tenants. If the DNS server goes down, it becomes a point of failure that could cause downtime and an outage. That was not ideal, which is why we decided to discard it.
Then we thought, maybe it's not about DNS. Why not create a Kafka proxy? The idea is that instead of pointing to a different server for DNS resolution, we configure the clients to interact with a new Kafka proxy. This proxy would abstract the connection to the managed Kafka service, including the in-house service, and provide simplified interfaces. Kafka clients in SHIP would connect to the proxy, and the proxy would handle connecting to the right Kafka with some configuration.
This could work, but it would mean creating a single point of failure. My team would need to maintain the proxy, and it would require development, maintenance, and patching—adding more operational load that made us think it might not be the best solution.
Bart: Your fifth attempt got closer to a working solution using CoreDNS with hard-coded broker addresses. Can you explain how this approach worked and what the remaining challenge was?
Fabián: After discussing and discarding various ideas, we looked into our available options. We examined our in-cluster DNS architecture and saw that we have CoreDNS, the DNS server most Kubernetes clusters use for internal DNS resolution. Since we knew it was there, we investigated what we could do with CoreDNS, specifically the template plugin.
With the template plugin in CoreDNS, you can rewrite DNS configurations. When someone asks CoreDNS for a host's address, you can rewrite and answer with whatever you want. As we mentioned earlier, the brokers are dynamic. Since they change, we would need to continuously update the DNS configuration to match these changes, which is not an ideal solution.
This approach would cause delays for clients because there would be a time lag between discovering changes and deploying the corresponding configuration updates. It could have worked if the lists were static, but since they were dynamic, we couldn't use this method.
Bart: Now, attempt 6 introduced the CoreDNS metadata plugin, which seemed to address the toil issue. It felt like you were getting very close. So what happened here?
Fabián: Essentially, we liked the CoreDNS approach. The only problem we had was that it was pretty static and cumbersome to maintain. So we started to think if we could create a similar solution that allows people to be more self-serve and dynamic.
What we did was keep the targets in the config, because those private links change, but not very frequently. We would allow people to label pods in a certain way. When you are in a pod running a service and want to contact the managed provider Kafka, you'll add a label saying, "I want to use Kafka 1 for this pod." Then CoreDNS would have the responsibility to rewrite the DNS resolution to target the right destination in the Kafka provider.
For the customer, this is quite interesting because they will only need to add a label, and everything will be managed for them. For the platform team, it would require maintaining the mapping, which is not ideal, but acceptable since the mappings don't change frequently.
However, when we deployed the first solution into our real clusters, we discovered we were using node-local DNS. Node-local DNS is an add-on that improves DNS latency, but in node-local DNS, you cannot support the metadata plugin. This metadata plugin was the key that allowed us to add the label and change the configuration so CoreDNS could direct the applications.
Bart: And that brings us to your successful solution: node-local DNS with a template plugin. Can you walk us through how this approach works technically?
Fabián: The idea is that we almost have a working solution with CoreDNS, a metadata plugin, and a template plugin with a label. However, we hit a block with node-local DNS because it didn't have the metadata plugin.
We looked into which plugins are available in node-local and found a template plugin that we had used several times before. We originally introduced node-local DNS after an incident where we discovered the in-cluster DNS architecture was heavily loaded. We used the template plugin to reduce load and discard garbage queries.
In our final solution, we decided to tell our tenants to deploy a service with a standard name targeting the right service. They can manage and are aware of the target, making it simple for them.
We also set a convention that the service needs a separate name to be discoverable in CoreDNS. In CoreDNS, we'll change the node-local DNS configuration to resolve "MyProvider.kafkaManageService" with a CNAME to the "KafkaServiceStandardName".
Given how Kubernetes DNS resolution works, Kafka clients in the target space will try to resolve "MyProvider.kafkaHostName.service.cluster.local". We added a configuration in node-local DNS that redirects queries for "MyProvider.kafkaManageService.namespace" to the target of the standard name service.
This approach ensures that configured clients will work, and clients that don't configure the service won't be broken.
Bart: From a tenant perspective, what do they actually need to do to implement the solution? When thinking about how much work falls on them versus the platform?
Fabián: Using the final solution will require very little configuration. It would require deploying this service type to keep the target. This could be a one-time setup. We also need to increase the ndots configuration for the pods. Why? Because the length of DNS records could be quite large. With the default ndots configuration, it could be treated as a complete record, which means the trick of using Amazon MyManager, KafkaProvider.namespace.service.cluster.local search will stop working, and the DNS will treat it as a complete record and return nothing.
Bart: Fabián, you mentioned tenants need to set ndots to 7 in their DNS configuration. For people in our audience who might be unfamiliar with this DNS parameter, what is ndots, and why specifically the value 7?
Fabián: The ndots parameter is part of the DNS resolver configuration. It controls the number of dots in a domain record that can be present before being considered a Fully Qualified Domain Name (FQDN). If you have more than a specified number of dots in the record, it won't search for additional domains.
For instance, if you have an index of one and pass a record like full.bar, it will consider "full" and not add the search domain configured in your DNS resolver. In Kubernetes, these are typically service cluster local or namespace as service cluster local.
The number seven is arbitrary and could be higher or lower. We chose seven as the number of dots for the record used for managing Kafka, but it can be configured and is also adjustable through annotation.
Bart: Manually configuring DNS config for every deployment sounds tedious. You mentioned using Kyverno to automate this. How does that work?
Fabián: Among our tenants and client base, we have people who are very Kubernetes-savvy and those who don't want or need to know Kubernetes deeply. We don't want people to enter the pod and change the DNS config spec field to add ndots, as that could be troublesome.
Instead, we created a user interface by defining an annotation called ndots. This annotation includes a number that should be higher than five (the default ndots configuration in Kubernetes). When we detect an annotation with a number higher than five, we use a Kyverno mutation policy that will mutate the pod spec to include the required DNS config field with values from the annotation. This approach enables people to use this hack in the CoreDNS configuration.
Bart: Now, looking at the big picture, you went through seven different attempts before finding the right solution. In retrospect, what were the key factors that made this final approach successful where others failed?
Note: While the transcript doesn't contain specific technical terms that can be directly hyperlinked, the context suggests this is a discussion about problem-solving at Adevinta, where the speaker Fabián Sellés Rosa works.
Fabián: The main idea of this article is to share with the audience that sometimes your first idea is not the best. When we were thinking about the feature, we set three evaluation criteria:
It needs to be maintainable
It needs to be self-serve, so people don't have to ask us to enable deployments
It needs to be evolvable and easy to maintain
If you run these criteria through every attempt, you can score them. This seventh attempt is the one that we think scores the best across all three areas.
Bart: Were there any unexpected lessons learned during this journey that might not be obvious from just looking at the final solution?
Note: Since the transcript doesn't contain specific technical terms that can be directly hyperlinked, I've added a link to Adevinta as the company where the speaker works.
Fabián: The path to any good solution is iterative. You need to think through all the possibilities and options, and start to discover and confirm things you like. Understanding the internals helps you make better choices. For instance, in our case, we were aware of our in-cluster team's architecture and what we can do with Kyverno and mutating webhooks. This also improves the user experience. If you just aim for a working solution, you might end up with a subpar solution. If you're willing to increase platform toil, that's fine, but you'll still likely end up with a less-than-optimal approach.
Bart: For platform teams facing challenges with split DNS and multi-tenant Kubernetes environments, what advice would you give based on your experience?
Fabián: Essentially, define your success criteria so you can measure different options and the trade-offs you make. At times, you need to relax those criteria as long as they don't talk to the platform or don't hinder the user experience. Sometimes it's the other way around. For instance, in our SHIP project, we aim to provide self-service interfaces to people with minimal toil to the background team. This means that the trade-off is that sometimes we need to invest more time analyzing available options. We may need to develop our own solution, which means more development time for the team.
Bart: Now Fabián, what's next for you?
Fabián: I still think there are angles of SHIP with Adevinta. It's incredible. I would like to write a couple of articles about things we have under our sleeves. I also like to keep an eye on the community, especially after the NGINX case. It's important that we can continue to contribute back, and we'll see if I have a place where I can continue.
Bart: And if people want to get in touch with you, what's the best way to do that?
Note: In this transcript, there are no specific technical terms that require hyperlinking. The text is a generic question about contact methods. However, I noticed the speaker is from Adevinta, which could be a potential link of interest.
Fabián: I have accounts on various social media platforms, but I'm mostly active on LinkedIn.
Bart: Fantastic. Well, Fabián, congratulations on your first KubeFM podcast. I hope our paths will cross again in the future if you continue writing articles. It's been a pleasure having some of your teammates on our podcast, with Thibaut and Thibaut. I wish you nothing but the best of luck and hope to talk to you soon. Take care.
