Security as Job Zero: Building Bulletproof Kubernetes Platforms

Jul 16, 2025

Guest:

Saptarshi Banerjee

This interview explores practical strategies for building secure, observable Kubernetes platforms and the future intersection of container orchestration with generative AI.

In this interview, Saptarshi Banerjee, Senior Solutions Architect at AWS, discusses:

Three emerging Kubernetes tools worth watching: Kueue for AI/ML job queuing and GPU scheduling, Kyverno for simplified policy enforcement in GitOps workflows, and vCluster for lightweight virtual clusters enabling better multi-tenancy
Making observability and performance testing meaningful by tying observability tools to specific operational outcomes and runbooks
Building secure platforms without deep security expertise, while involving security stakeholders early in the design process

Relevant links

Transcription

Bart: So, first things first, who are you, what's your role, and where do you work?

Saptarshi: I'm Saptarshi Banerjee, a Senior Solutions Architect at Amazon Web Services (AWS), and I'm based in Seattle.

Bart: Fantastic. Now, what are three emerging Kubernetes tools that you are keeping an eye on?

Saptarshi: Here are three Kubernetes tools I'm watching closely. First is Kueue, a job queuing system for Kubernetes designed for AI and ML workloads. It works well with Kubeflow Pipelines v2 and helps manage GPU scheduling and batch jobs in shared clusters, which is essential as GenAI workloads scale.

Second is Kyverno, a Kubernetes-native policy engine that makes writing and enforcing security and governance policies much simpler than traditional tools like Open Policy Agent. It perfectly fits into GitOps workflows and helps platform teams maintain compliance by default.

Third is vCluster, developed by Loft, which lets you create lightweight virtual Kubernetes clusters within a single host cluster. It's great for multi-tenancy, CI/CD isolation, and developer self-service without the overhead of managing multiple full clusters.

Together, these tools are helping teams scale Kubernetes responsibly, securely, and in ways that support modern GenAI and other workloads.

Bart: So, one of our podcast guests, Mac Chaffee, thinks that you can't just stumble your way into building a secure orchestration system. How do you approach security when designing platform solutions without deep security knowledge?

Saptarshi: At AWS, Amazon Web Services, we treat security as job zero. It's our top priority and foundational to every platform decision we make. When I design a solution without being a deep security expert, I start by leaning into AWS's built-in security controls. That means least-privilege IAM roles, VPC-level isolation, and services like AWS Secrets Manager, KMS for encryption, and CloudTrail for full auditability—all out of the box.

For AI and orchestration workflows, I integrate Amazon Q Developer to catch security issues during development and use Bedrock Guardrails when building GenAI workloads to ensure responsible behavior. For access control, Amazon Verified Permissions helps manage fine-grained authorization at scale without writing complex policy engines.

The key is to let AWS handle the heavy lifting when managing these workloads. You do not need to be a security engineer; you just need to use secure, battle-tested building tools. Always involve your security stakeholders early, not at the end.

Bart: Well said. Our other guest, David, asked why you would introduce observability tools if observing isn't part of someone's job, suggesting that many technologies are redundant without operational feedback. How do you ensure observability tools actually serve a purpose in organizations?

Saptarshi: When introducing observability tools without a feedback loop, it's like installing security cameras that no one watches. At AWS, when I work with customers on Kubernetes-based platforms, I always tie observability to a specific operational outcome. It's not just about metrics; it's about enabling better decision-making.

For example, if a team runs workloads on Amazon EKS, I recommend integrating Amazon CloudWatch container insights, AWS Distro for open telemetry, and X-Ray, but only when there's a clear purpose. This could mean improving deployment reliability, reducing mean time to detect, or catching anomalies during scale-up.

I always ask myself: Who will act on these signals? If no one wants them, it's just noise. That's why I recommend aligning observability tools with runbooks, service-level objectives, and incident response, and making it a part of platform teams' KPIs.

Observability data becomes far more valuable when combined with AI and services like Amazon DevOps Guru or CloudWatch anomaly detection, which helps surface issues before they impact production. Tools are only useful when paired with operational intent, and my role is to ensure that the intent is clear before anything gets deployed in the cloud.

Bart: One of our guests, Stefan, thinks that you can't really expect testing your application in a dev or QA cluster prepares you for production, because production is always different. What's your approach to performance testing?

Saptarshi: At AWS, when I work with customers on Kubernetes platforms, especially with Amazon EKS (the managed Kubernetes service), my approach to performance testing is grounded in realism and iteration.

First, I help them simulate production-like environments using production-grade grid configurations, with the same auto-scaling policies, resource limits, and ideally the same deployment pipeline. We use tools like Locust, K6, or Apache JMeter, often orchestrated via AWS Fargate (a serverless offering), or EKS jobs to stress the system under real-world loads.

Second, we leverage observability to catch bottlenecks by combining CloudWatch, container insights, X-ray with performance baselines tied to Service Level Objectives. If you do not measure latency under load, you are just guessing and making a mistake.

Finally, I encourage progressive validation: always run tests in staging, then run canary performance tests in production using feature flags or traffic mirroring. The closer you get to truth without hurting real users, the better.

The bottom line is that performance testing isn't just a phase—it's a continuous feedback loop baked into how you release, monitor, and tune in real time.

Bart: Kubernetes turned 10 years old last year. What can we expect in the next 10 years to come?

Saptarshi: Kubernetes has come a long way in the first 10 years, from container orchestration to becoming the backbone of modern cloud infrastructure. In the next 10 years, we'll be less about managing clusters and more about abstracting them away. We will see a shift towards platform engineering and developer experience, where developers will no longer care about nodes or YAML. They will just deploy and scale securely using internal platforms—essentially, Kubernetes without Kubernetes.

I also expect a lot of AI-native platforms to emerge. As generative AI, large language models, and agents become central to enterprise applications, Kubernetes will need to evolve with GPU schedulers, dynamic auto-scaling, and ML observability baked in. Security and policy will go mainstream as well, with tools like Kyverno, OPA, and Sigstore helping to ensure and enforce compliance automatically across multi-cloud and edge environments.

Finally, I think Kubernetes will power more autonomous systems at the edge, from self-driving cars to IoT clusters, orchestrating not just code but also intelligence. Kubernetes isn't going away. It will just become invisible with time, hidden behind higher-level abstractions that will empower teams to build faster, safer, and smarter applications.

Bart: And what's next for you?

Saptarshi: What's next for me is GenAI and Kubernetes: helping organizations adopt GenAI securely at scale and in production. At AWS, I'm working with customers who want to move beyond isolated proof of concepts into real-world GenAI platforms. Kubernetes still shines as the foundation for orchestrating LLM workloads, managing GPU scheduling, and integrating data pipelines, inference layers, and AI agents across environments.

I see a future where GenAI is agentic, composable, and autonomous, with Kubernetes becoming the control plane for all of that. Whether you're running fine-tuned models with Amazon Bedrock, deploying APIs with Amazon SageMaker, or using native tools like Kueue or Ray, the goal remains the same: make GenAI reliable, observable, and cost-efficient.

What's next for me is building and evangelizing GenAI native platforms, combining the intelligence of LLMs and the power and maturity of Kubernetes to help customers shape the next wave of intelligent applications at AWS.

Bart: Fantastic. And how can people get in touch with you?

Saptarshi: They can always reach out to me over LinkedIn, and they can also email me. My email ID is saptarshi@amazon.com, but LinkedIn would be the preferred way to contact me.

Bart: Thank you so much for your time today. I look forward to talking to you in the future. Take care.

Saptarshi: Thank you. Thanks, Bart.

Podcast episodes mentioned in this interview

Exploring multi-tenancy for my Kubernetes learning platform
with Stefan Roman
My pipelines from GitLab Commit to ArgoCD got beaten by FTP
with David Pech
Dear friend, you have built a Kubernetes
with Mac Chaffee