Platform engineering best practices: from developer experience to cost optimization

Dec 11, 2025

Guest:

Mike Stefaniak

Platform engineering teams are navigating the balance between developer autonomy and operational control while optimizing Kubernetes costs and abstractions.

In this interview, Mike Stefaniak, Head of Product for Kubernetes and Registries at AWS, discusses:

How platform teams are moving away from central portals toward building IDE plugins that work directly in developers' environments, supported by LLMs fine-tuned on company practices and shift-left methodologies
Why completely hiding Kubernetes from developers is a mistake, and how the optimal approach provides paved paths and best practices
Distinguishing between node-level optimization and the more challenging application-level optimization involving both horizontal and vertical scaling

Relevant links

Transcription

Bart: The transcript snippet is incomplete and lacks specific details about Mike Stefaniak's role at AWS. Without more context, I cannot confidently add hyperlinks. Could you provide the full transcript or more details about his role?

Mike: Hi, my name is Mike Stefaniak. I lead the product management teams at AWS for our Kubernetes service, Elastic Kubernetes Service (EKS), and Elastic Container Registry.

Bart: ECR. What are three emerging Kubernetes tools or trends that you're keeping an eye on?

Mike: A few trends I've seen here at KubeCon: First, developer portals are shifting left, with end users moving more towards IDE plugins. Developers are spending more time in their integrated development environments to get work done, rather than navigating a central portal.

Second, Kubernetes-native cloud resource management is gaining traction. Traditional infrastructure-as-code tools like CloudFormation and Terraform have been around for a long time, but more people are looking towards Kubernetes-native tools to manage their infrastructure. For example, AWS has a project called ACK (AWS Controllers for Kubernetes) that allows developers to define their application alongside supporting infrastructure like an S3 bucket.

Third, security remains top of mind. AWS has released an open-source project called Cedar and introduced a preview of a Kubernetes authorizer for it. They've also submitted a KEP (Kubernetes Enhancement Proposal) for upstream enhancements to provide more fine-grained authorization in Kubernetes. Security continues to be a critical focus.

Bart: One of our podcast guests, Ben Poland, thinks that for a platform team, you want to empower anyone to make the changes they need rather than centralizing everything. How do you balance self-service capabilities with governance in platform engineering?

Mike: I think it's somewhat similar to what I mentioned at the beginning about shift left, giving developers more authority. A central portal that they can go to for documentation can still be helpful. However, with an LLM or a model fine-tuned on the company's best practices, infrastructure, and documentation, developers can get most things done in an IDE. Many end users and customers I've talked to are moving to this model where the platform team still exists, but they're building plugins for IDEs instead of creating central platforms. I would tend to agree, and I'm actually seeing this with more advanced platform engineering teams starting to shift left and put more where developers are used to working.

Bart: Another guest, Andrew, believes that the average software developer shouldn't need to understand Kubernetes YAML indentation and specs. Should platform teams completely abstract Kubernetes away from developers, or is some Kubernetes knowledge essential for application teams?

Mike: I've talked to thousands of customers over the years and seen everything from giving every team a cluster to container as a service that completely abstracts everything away. I'd say the two extremes neither work well. Kubernetes is designed for a model somewhere in the middle.

Should your developers have to understand everything about Kubernetes? Absolutely not. You want to build abstractions, paved paths, and best practices. But I think it's a mistake to completely hide that their application is running in Kubernetes, especially because for application developers, the Kubernetes ecosystem offers so much to get started. If you want to run a Spark application, you're not starting from scratch—you can look in the community.

No, I would say it's a mistake to hide everything. It's equally a mistake to show them everything. The right approach is somewhere in the middle.

Bart: Our podcast guest Marc thinks that with Kubernetes, it's quite easy to pay more than necessary because you pay for allocated or provisioned infrastructure—machines you start that are often underused. What strategies do you use to optimize Kubernetes costs?

Mike: There are two layers when it comes to optimization in Kubernetes: the optimization of nodes and the optimization of applications. Optimization of nodes, at least at AWS, is a pretty well-solved problem now. We launched and open-sourced the Karpenter project several years ago. It has become the de facto standard, taking over Cluster Autoscaler as the most commonly used node autoscaling solution. With Karpenter, you don't have to think about hundreds of different instance types. You just submit your application requirements and let Karpenter pick the best compute for your workloads. If you want a more managed experience on EKS, we have AutoMode, which is a managed version of Karpenter and is even more hands-off.

The application layer is more challenging. We see customers struggle there, as it's a harder problem. Generally, you'll solve auto-scaling vertically and horizontally. Horizontally scaling is more of a short-term response to spikes. Some mistakes people might make include only looking at CPU and memory. You might look at a project like KEDA because you want to horizontally scale using other metrics. Vertical scaling is more of a long-term approach, where you might look over several weeks to figure out how to optimize long-term.

These two approaches are generally more of an unsolved problem. There are many tools and resources, but it remains a hard challenge. I think application scaling is the bigger challenge now and something we're thinking about, whereas node scaling we feel is pretty well-solved at this point.

Bart: And Mike, what are most people getting wrong about platform engineering?

Mike: I would say a shift we're starting to see is that five years ago, our EKS customers wanted the managed Kubernetes control plane and wanted to install every other add-on and do everything themselves. We're starting to see a shift both from new Kubernetes customers and even our longtime, more advanced customers who are saying there's not a ton of value in managing every single add-on in a cluster.

Using managed services at EKS, we have features like add-ons, auto mode, and managed capabilities that free up more time and put some of that heavy lifting onto the managed service so you don't have to worry about it. Teams that are still running 30 add-ons in a cluster themselves, I would say that's a mistake in getting platform engineering wrong right now.

Bart: What's next for you?

Mike: It's re:Invent season for us at AWS. We have a lot of launches coming up. We announced several this week, but keep your eyes out on AWS news over the next couple of weeks. Some really exciting innovations are coming for both EKS and ECR. That's going to keep me busy over the next month.

Bart: Mike Stefaniak (who works for Amazon Web Services (AWS)) can likely be contacted through professional networking platforms like LinkedIn or through AWS professional channels.

Mike: LinkedIn is the easiest way. Connect with me there, and we can chat.

Podcast episodes mentioned in this interview

Beyond Kubernetes: Serverless Execution Models for Variable Workloads
with Marc Campora
Scaling CI horizontally with Buildkite, Kubernetes, and multiple pipelines
with Ben Poland
Our Journey to GitOps: Migrating to ArgoCD with Zero Downtime
with Andrew Jeffree