Running a Full Kubernetes Cluster for $2 a Month

Jan 27, 2026

Host:

Bart Farrell

Guest:

Varnit Goyal

This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

Most developers assume Kubernetes requires an enterprise budget. Varnit Goyal proves otherwise — he built a full three-node Kubernetes cluster for $2.16/month using Rackspace Spot Instances.

The trick: pick non-default instance types, distribute nodes across low-demand regions, and let Kubernetes handle rescheduling when nodes get preempted. For service exposure, he replaced the $10/month load balancer with Tailscale Funnel — free.

In this episode:

How Spot Instance bidding works and which strategies keep costs and preemption low
Using Tailscale Kubernetes operator as a free alternative to traditional load balancers
Running real development dependencies (Kafka, Elasticsearch, Postgres) on a budget cluster

A practical walkthrough of what Kubernetes actually needs to function — and what you can strip away.

Relevant links

Transcription

Bart Farrell: What does a real Kubernetes cluster look like when you only have $2 a month to run it? In this episode of KubeFM, we're joined by Varnit Goyal, a senior software engineer at Clear Street. Varnit breaks down how he built a full multi-node Kubernetes cluster for just over $2 a month using Rackspace Spot Instances, including the bidding strategy, region selection, and instance types that make it stable enough to use. We dig into the operational side of on-spot capacity, handling preemption, monitoring node reclamation, and using the multi-region worker strategy to reduce downtime. This is a practical experience-driven discussion about what Kubernetes really needs to function and which patterns still hold when you remove budget as a safety net. This episode is sponsored by LearnKube. Since 2017, LearnKube has helped engineers from all over the world level up through Kubernetes Courses are instructor led and are 60% practical, 40% theoretical. Students have access to course materials for the rest of their lives. Courses are given in-person and online to groups as well as individuals. For more information about how you can level up, go to learnkube.com. Now, let's get into the episode with Varnit.

Varnit Goyal: (music) You're tuned in to KubeFM.

Bart Farrell: Varnit, welcome to KubeFM. What three emerging Kubernetes tools are you keeping an eye on?

Varnit Goyal: eBPF is something new which I am keeping an eye on. It's an interesting technology and it lets you experiment about this sidecar overhead in Kubernetes. So that's the number one tool that I'm looking at. Other than that, I'm interested in MetalLB because I do home labbing and MetalLB plays an essential role when you're trying to build your own Kubernetes cluster at your home. then, there is, this DevSpace. So basically what I found out in other companies and my company as well, there is always, a development, environment, a dev environment, which, a lot of developers use. it keeps breaking. So DevSpace is something which lets every developer spawn their own sort of environment, so it really helps in the productivity. DevSpace is something which I am exploring right now. These are the three tools I usually am interested in.

Bart Farrell: Very good. And Varnit, for people who don't know who you are, can you let us know a little bit more about what you do and where you work?

Varnit Goyal: sure. I'm not senior software engineer. I am more like a full stack developer, not really, associated to or don't want to associate myself to a particular technology as per se. I consider more of myself as a problem solver, rather than, a software developer for a particular technology per se. I'm working as a senior software engineer and I currently work at a company called Clear Street. It's a prime brokerage platform based in US. they help, institution clients and active traders to trade in the US markets. that's what I am. Outside of my work, I am interested in exploring technologies, I'm interested in home labbing, automations and, I need to just build these tools which help other people and help to automate my own life.

Bart Farrell: And how did you get into cloud native in the first place?

Varnit Goyal: Well, if you are in the software industry, it's really hard and almost impossible to escape from cloud native because it's everywhere, right? So (laughs) the companies that I'm currently working in, the companies that I've worked in the past, they all usually run all their workload on Kubernetes and they all usually, run their workload in more or less like a cloud native way. So it was always, part of the life cycle as a software development, that's how I came across these cloud native tools and, on Kubernetes in general. That raises the curiosity to explore more of inside the Kubernetes, more inside the cloud native, tapping into the power of cloud applications.

Bart Farrell: And what were you before cloud native?

Varnit Goyal: as per se, I was born into the cloud native era so (laughs) when I started development, I joined companies, it was cloud native already there. So, there's not much context as per se for me, what I was doing before the cloud native era. But because when I was actually gone into the software development, it was already there, so.

Bart Farrell: Now, the Kubernetes and cloud native ecosystem move very quickly. How do you stay up to date with all the changes that are going on? What resources work best for you?

Varnit Goyal: But I don't want to name a particular resource. Medium is a good one. I read Medium articles, Reddit has some really good threads. So there are obviously multiple resources. I watch videos, I read articles, hear podcasts. So there are definitely multiple resources you can defer to keep updated with the community in the cloud native era. But what I prefer to do is, when a new technology comes, I try to build some entity or some tool being around it. So I think when you build stuff with this new technology, you learn a lot more than just reading something or watching a video. So for example, I recently was exploring eBPF, so I'm just trying to build some observability layer around the eBPF and Kubernetes just to get rid of its sidecar overhead.

Bart Farrell: And Varnit, if you could go back in time and share a career tip with your younger self, what would it be?

Varnit Goyal: Build more, consume less. (laughs) So that's the number one tip I have learned here. When you build stuff, you solve real world problems, you learn a lot more than you just consuming a video or reading an article. that is something which I've learned.

Bart Farrell: As part of our monthly content discovery, we found an article that you wrote titled Run Kubernetes Clusters Under $2 a Month. That sounds pretty exciting. So for many developers that are out there, they assume that Kubernetes requires enterprise level budgets, often deterring individual developers and small teams from using it because they say from a cost perspective it's just a nightmare or just something they simply can't afford. What prompted you to explore ultra low cost Kubernetes solutions and what was your initial goal?

Varnit Goyal: So like I said, I was always fascinated with the Kubernetes and, it definitely looks like an enterprise tool because, you work in companies, they run, a lot of nodes, a lot of pods, everything is there. So I always want to learn like how these things works. how Kubernetes work in general, right? So I was exploring solutions around it. I explored Minikube, that allows you to run a Kubernetes cluster in your own system. But, they were not very satisfying because they are more like a very cut-down version of how the real Kubernetes cluster looks like. So I was not really satisfied with Minikube and other, Docker Desktop allows you to run the Kubernetes cluster. So I wanted to always experience the full-fledged Kubernetes, with Kubernetes clusters running on multiple nodes, running multiple pods. There is a network layer on top of it. So I wanted to experience that, so I started exploring, AWS, exploring DigitalOcean, Akamai, other cloud providers, where they can actually spawn this Kubernetes cluster, but they were mostly on, a lot of expensive side just to toy around. And since I live in India and the parity is also a thing, it's not, very affordable for me to spawn a full-fledged Kubernetes cluster just to play around it. that is what, where my curiosity came in, like how to actually spawn a Kubernetes cluster, a full-fledged Kubernetes cluster to play around and at an affordable cost. that's where I started exploring it.

Bart Farrell: Now, you mentioned Rackspace Spot as a game-changer for affordable Kubernetes. Can you explain what Spot Instances are and how Rackspace Spot differs from traditional Kubernetes hosting?

Varnit Goyal: So I was, one day I was just, visiting some Reddit thread and, someone just mentioned, "Hey, Rackspace is providing really cheap machines, in the form of Spot Instances." I went ahead, exploring it. So what I found is that they really provide, Spot Instances. And for the audience, people who are not really familiar what is a Spot Instances, so I just do a brief thing there is, so basically every cloud provider out there, they have some sort of a buffer capacity that they have to keep to support the spike loads and the demand that might happen. So this buffer, capacity, it keeps lying around, empties. What they do is they usually bid this capacity at a really low cost on a condition that they can take this, capacity back when needed. So suppose, as a cloud provider, I have a hundred machines in buffer and only 10 are being used right now. So these 90 machines capacity is sort of like wasted. So what they do is they provide you these 90 machines for a really cheap price on a condition that they can take it back anytime it's, if, demand comes. So that's what the Spot Instances is. so Rackspace provides this, these Spot Instances. The thing that, the proposition that Rackspace Spot is providing is they allows you to, use these Spot Instances with the Kubernetes cluster. And the interesting thing is this Kubernetes cluster will be managed by them. So you don't really have to spawn the Kubernetes cluster control plane by yourself. What you do is, you can do is have some Spot Instances and just, connect them to the their own managed Kubernetes cluster. So this way you can really enjoy a full-fledged Kubernetes cluster at a really cheap cost. Like you can even make every single node, Kubernetes cluster and it will be actually act like a full-fledged Kubernetes cluster. So that's, that was a very interesting idea for me. Okay, like now finally I will be able to enjoy the power of full Kubernetes cluster and the pricing will also be very affordable.

Bart Farrell: And affordable is, the right word to use, as the price that you achieved is remarkably low, two dollars and sixteen cents per month for a three-node cluster. Can you walk us through the specific bidding strategy and instance selection that makes this possible?

Varnit Goyal: definitely. So obviously Spot Instances are tricky to get it right because of course they can be preempted, right? That's the tricky part. So, what bidding strategy I've used is, so the bidding strategy in Rackspace Spot, they started at 0.001 dollar, which is really cheap. so, you can just have a starting number as this and, you don't need to bid, very high because that will shoot up the cost a lot. what you can do is, choose instances which are not default. So for example, I chose memory-optimized instances. It doesn't really matter. I want to run test workload. So you can choose compute instance or memory-optimized instance. Why? Because they are not the default option, in Rackspace Spot. So that's why, they are less in demand. That's what I found out after experimenting with multiple configuration. Other than that, I just, bid, from 0.01 to something like 0.2 dollars. So just choose the highest amount, to the budget you don't want to extend actually. You don't need to choose a very high amount. It's fine. that's my bidding strategy and, the other thing is how I actually globally distribute my nodes to have further reliability. I think the tricky part is you get your Spot workers into regions which are not very popular. So for example, Hong Kong, Australia, like they are not default regions. They are not the regions where the low demand is expected, right? So I spawned a few instances on, in Hong Kong region, in Australia regions, and, surprisingly they never get preempted because there was no demand at all, in these regions. So I think the tricky part is because it's testing workload, it doesn't matter. it's better to just, spawn those worker nodes into regions where the competition is low so that's why your pricing will always be low and-The chances of your machine getting preempted is extremely low. Like I almost it almost never got preempted for me when I was spawning my nodes in the Hong Kong and Australia regions. this is, these are a few tips you can use to keep the cost really low and also keep the, your clusters stable.

Bart Farrell: Spot instances come with the inherent risk of being preempted when someone bids higher. What immediate monitoring solution did you implement to stay aware of potential node reclamation?

Varnit Goyal: Yeah, sure. So a simple way to do it is that Rackspace Spot, it, they provide you with sort of a Slack webhook alerting system. So once they are about to preempt your instance, they send you an alert on the Slack. You can configure that Slack webhook, where they'll send you this alert. You'll get some, I think, like 5 to 10 minutes or something like that to take an action before it actually gets preempted. So I have configured this webhook on my personal Slack. They sent me the alert. You get a time to take some action, maybe increase the bid or spawn a new instance in some other region. what I personally did is I just configured a really simple script which is triggered by this Slack webhook. So they send a Slack webhook and my script just will spawn a new instance into some other region. Just delete that instance from that particular region where it's going to be preempted. And, since Kubernetes is really good at rescheduling your workloads, right, Kubernetes is able to detect that, okay, this node is no longer healthy, it will just reschedule the work load into some other region. So, that's how I am handling this thing right now. And it's working pretty well for me.

Bart Farrell: And beyond monitoring, you implemented a multi-region strategy for improved reliability. Can you explain how geographic distribution helps with Spot instance stability?

Varnit Goyal: definitely. So it works like that. So for example, you have few worker Spot nodes in the Australia region and there is a match or, a good, a popular match coming up in Australia, and the demand sudden increase in the Australia region, and that's why your nodes get preempted. So what multi-region provides you is a protection against, these kind of demands and spikes. that's how it is.

Bart Farrell: Now let's talk a little bit more about a major budget concern. Load balancers typically cost around $10 per month, which would blow your $2 budget, completely out of the water. What creative solutions did you find to expose services to the internet?

Varnit Goyal: sure. So definitely, I think load balancer is one of the blocker that I, that always been there. So when I was exploring Kubernetes and other cloud providers as well, for example, DigitalOcean, they also have a very expensive load balancer. By expensive, I mean it's $10, but for the person who is just trying to play with Kubernetes, it is expensive. Rackspace Spot also provides a load balancer and there is no really, real concept of Spot load balancer. Like load balancer is actually always sort of a production level component, so that's why it costs more. So in Rackspace also, it is costing around $10, which is obviously going to shoot up the budget, which I did not want. But on the same time, I also wanted to expose few of my services to the internet. So what I was... I am also a big fan of how Tailscale, works and also, their, about their very generous free tier. So I was exploring Tailscale and they have something called a Tailscale Funnel and, they also have an operator for Kubernetes. So I was exploring around it, how it works, and found a way, okay, to set up a Tailscale operator in a Kubernetes cluster and I was able to expose services over the internet. and if you're not even interested into exposing it internet, the simpler way is to expose it to Tailscale network and then you can access your service from any device which is connected to Tailscale, because they sort of provide a VPN mesh which, through which you are able to sort of access your private devices. So they create some sort of a virtual network layer on top of all of your devices, and your devices can communicate together similar to a private network. and, in case you really want to expose it to public, you can use Tailscale Funnel to expose it to public. Like both options are available and pretty simple

Bart Farrell: For developers that might be unfamiliar with Tailscale, this could be a new approach to service exposure. Can you walk through the basic setup process for integrating Tailscale with your Kubernetes cluster?

Varnit Goyal: well, it was pretty easy. Tailscale documentation is good around this topic. All you have to do is they have some sort of one Helm chart which helps you to set up a Kubernetes operator. So you can just install those Helm chart in your Kubernetes cluster. then you have to create some OAuth application to authenticate your Kubernetes cluster with Tailscale. And then you have to set up some auth rules in your Tailscale dashboard. So I think within 5 to 10 minutes you should be able to set it up. It's quite simple. It's not very difficult. If you're familiar with, basics of Kubernetes, should be good enough.

Bart Farrell: You provided a complete example deploying NGINX with Tailscale exposure. Can you explain how the Tailscale ingress annotation works and what makes it different from traditional ingress controllers?

Varnit Goyal: to be honest, it's very similar to how you expose your application through a traditional ingress. So it's about, you just have to add some sort of automated annotations, with Tailscale construct and it should be done. Like the Tailscale operator should be able to detect these annotations and should be able to expose the service. So it's not very... it's not any different than exposing a service through a traditional ingress. Like it's very much same. Only the annotation part is different.The way you do it, it's very similar.

Bart Farrell: Okay. And this setup seems perfect for development and learning environments. What types of workloads have you successfully run on this ultra, low budget cluster and what are the practical limitations?

Varnit Goyal: as a developer, you run into many dependencies. So for example, as a developer, I often need a Kafka, Elasticsearch. These are very heavy dependencies and if you run it on your local system, it usually ends up jamming your system. So I run Kafka cluster, Elasticsearch, all Postgres, all kind of development dependencies into my Kubernetes cluster. I also run some MQTT servers, which is, helping in my home automation stuff. other than that, I also run some, through database, some dev environment. So when you want to compile some heavy things, you can just offload it to the Kubernetes cluster network and it will compile for you really fast. So, I'm primarily using it to, keep my development machine light and, offload the heavy things to my cluster. And, I think that's majorly what I'm using it at. About the limitation part, it's perfect for my workload, I think, and I don't see any limitations as such. and one thing is, this, about limitation, I can't really think of any limitations as of now. Maybe we can talk later on,

Bart Farrell: Okay. For teams or individuals that are currently paying hundreds of dollars monthly for Kubernetes, this approach might seem too good to be true. What reliability metrics or uptime have you observed with the setup?

Varnit Goyal: I think it's pretty reliable. So for me it's surprisingly too good to be true too. if you spawn the nodes, worker nodes into multiple regions and if you have a right bidding strategy, it's very reliable, I'd say. I never really had any problems I've been running my workload from the last three to six months and I never really ran into any reliability issue. But, I'll definitely recommend, to have, a real load balancer other than Tailscale, some good metrics. Like, I'm using Slack webhook, but probably some Datadog or some other, bigger metrics platform can be used to monitor it well and, make it more reliable. But for me it's very reliable. Never had major issues with it yet.

Bart Farrell: So this $2 cluster is a remarkable and ingenious project that proves what's possible with creative thinking. However, it's clearly not production ready as is, or is it? If you were forced to use Spot Instances in production to achieve similar cost savings, what would you need to fundamentally change about this architecture?

Varnit Goyal: it's definitely not a production-ready project, at least not in the current form. But it's very easy to actually make it production-ready. What you need to do is Rackspace provides you some on-demand nodes too. So what you can do is you can use a combination of Spot Nodes and on-demand nodes, to make it, production ready. I think it... best way is you can start with this, really cheap instances and when your application is ready to fly, just, add some on-demand instances, add a real load balancer and you are good to go. So it's really easy to upgrade. You don't really need to change the fundamentals of your cluster over there.

Bart Farrell: the multi-region approach worked beautifully for your hobby cluster, spreading risk across geography. But if you had to implement this same multi-region distribution for a real production workload, what specific challenges would force you to rethink this simple setup?

Varnit Goyal: Well, I think latency plays a really important role when you do a production setup. So you definitely want your nodes to be near your user, right? So that it's for user, it's for use. So what, I would do is I would, spawn some nodes to the place where my users are and I'll distribute the workload into two types. One which is user-facing and one which is not real-time, it might not be user-facing. And, I'll use given this affinity to, schedule the nodes which is user-facing to the nodes which is near to my user. And I can, spawn rest of the nodes to the Spot workers, which is... which might be, hosted far away. It doesn't really matter because latency is the only thing which is, stopping you to actually geographically distribute a lot of nodes.

Bart Farrell: Tailscale was a clever workaround that solved your load balancer cost problem perfectly. But if Tailscale was your only option for production traffic, perhaps due to specific security requirements, what architectural changes would you make to handle thousands of concurrent users?

Varnit Goyal: Well first of all, Tailscale free tier only provides, a limit of 100 users, 100 devices as such. so I would say I will definitely go with Tailscale paid tier, which, supports more user and, more reliability. So if you have an internal applications which you really want to distribute across your company, your enterprise, I think, Tailscale is a good option. You can just switch to their paid tier, use Tailscale VPN for internal workloads. For public workloads, however, I will prefer to use public load balancer. So Rackspace provides load balancer at $10. You can just... and it's really easy to use. All you need to do is just change an annotation in a deployment script and it will just spawn a Rackspace load balancer rather than a Tailscale load balancer I'll say for production workload, I'll use a combination of Tailscale plus, Rackspace load balancer. For internal workloads, Tailscale is more than fine. For user-facing workloads, however, I'm still good with the Rackspace load balancer.

Bart Farrell: After running this two-dollar cluster experiment, you've learned valuable lessons through extreme constraints. What specific Kubernetes patterns or practices that you discovered would you actually bring to a well-funded production environment?

Varnit Goyal: I think, the number one lesson that I've learned by running it is, creative thinking can takes you very far. And, other than that is, about the Kubernetes is that it's really good at rescheduling and detecting unhealthy work and nodes. So, when I was experimenting with it, my nodes were getting preempted. What Kubernetes was really good at, rescheduling the workload into some other places. And it worked. Like, it works beautifully if you, Kubernetes is really good at rescheduling your workload and detecting unhealthy nodes and unhealthy parts and stuff. So I think that's one of the best thing I loved about Kubernetes. Other thing is, so like you can really build a fancy stuff around Kubernetes, like fancy networking layer, fancy metrics and all that. But, what I've learned is if you keep it simple, it works. And so you don't really need a lot of fancy things to start with Kubernetes. Even ultimately it's just a really good, a workload scheduler if you think in that terms. I think things get easier.

Bart Farrell: All right. And, Varnit, you seem to have put a fair amount of time into this project. What's next for you?

Varnit Goyal: definitely. So I'm much of a tech enthusiast. I keep exploring stuff. So I'm currently working in eBPF, domain mostly and trying to explore power of eBPF. how to improve the tracing and Kubernetes, how to improve the tracing for applications without, much of an effort for developers. Currently I see, many developers, they have to manually do implement the tracing and all. But, through eBPF, you can actually trace this application calls and, improves the tracing automatically. So I'm exploring around that, eBPF capabilities, in observability space. That's what I'm working on. So looking forward to, build some open source tools which can help the community build better stuff. that's what I'm doing.

Bart Farrell: And if people want to get in touch with you, what's the best way to do that?

Varnit Goyal: I'm very active on LinkedIn. So anyone wants to connect, just reach out to me on LinkedIn. I use the open LinkedIn multiple times in a day. So yes, I'll get back to you whenever I can.

Bart Farrell: Fantastic. Great. Well Varnit, thank you so much for your time today, sharing your experience. Look forward to hearing about new projects that you're working on in the future. Take care.

Varnit Goyal: Sure. Bye.

Bart Farrell: (music plays) You're tuned in to KubeFM.

Listen anywhere