Faster Container Pulls for AI Workloads with SOCI

Faster Container Pulls for AI Workloads with SOCI

Apr 14, 2026

Guest:

  • Phil Estes

Container images are growing larger, and for AI workloads running on Kubernetes, slow image pulls mean slower startup — a real bottleneck at scale.

Phil Estes, Principal Engineer at AWS, shares how his team built SOCI (Seekable OCI) and the fast pull snapshotter to solve this — and why they contributed it all upstream to benefit anyone using Containerd.

In this interview:

  • How SOCI and lazy loading reduce container startup times for large AI images on Kubernetes

  • Why open source contribution and upstream-first development matter for the long-term health of the Kubernetes ecosystem

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via

Transcription

Bart Farrell: Who are you, what's your role, and where do you work?

Phil Estes: So I'm Phil Estes, I work at AWS as a principal engineer, and I'm focused a lot on upstream community activity around the Containerd project where I'm a maintainer, and also how AWS shows up for developers and how we contribute to open source as a company.

Bart Farrell: And what are three emerging Kubernetes tools that you're keeping an eye on?

Phil Estes: I think it's super hard to keep up with all the emerging stuff in Kubernetes. I'm super interested in a lot of the AI work going on, so Ray, being used, as a project with Kubernetes. K8sGPT, which is a super cool project added to the Sandbox. I think it's still Sandbox. And then yesterday in our keynote, Jesse, one of our, product folks on EKS talked about Kro, that, the super neat thing is we have other people, getting involved. And so yesterday he announced that SAP now has a maintainer on Kro. And really that's about just trying to bring back, simplicity, ease of use, and kind of abstracting away some of the complex bits about using Kubernetes.

Bart Farrell: So we've been hearing a lot about SOCI pull and its impact on container performance. For folks who might not be familiar with the underlying challenge, what problem were you actually trying to solve and why does it matter so much for teams running workloads on Kubernetes?

Phil Estes: So this really relates very strongly to the whole shift to AI workloads and all the explosion of AI tooling. And so SOCI and our fast pull snapshotter are attempts to help with the image pull time for a container runtime. So, before your workload gets to run as a container, that container image has to be pulled from a registry. And so as images get larger and larger, we've been looking at ways to provide benefits and technologies to make that a shorter pull time so that your workload can be running faster. And so SOCI was initially a lazy loading technology. So lazy loading is a concept that various people in the runtime community have been working on for a while, and so we created an open source project around that. And then with our work on EKS for ultra cluster, having very large clusters with thousands of nodes, we created the fast pull snapshotter with new ways to make image pulling faster and more efficient.

Bart Farrell: So when you set out to solve this image pull bottleneck, what did you actually build? And for teams evaluating where to run their AI workloads, what's the easiest way to take advantage of this technology?

Phil Estes: So the cool part is the technologies we've created are all open source, they're all upstream, and they tie right into the Containerd ecosystem. So even if you don't use a managed product or use a service like EKS, you can use these within your own cluster, configuring Containerd to use these snapshotters. The other cool thing is that a lot of these investigations and improvements are making their way into upstream core projects like Containerd, so you don't even have to use a custom snapshotter. And so, there's a lotta information, a lot of blog posts and sharing that's going on to make this information available to people who use these tools.

Bart Farrell: You didn't just build this for AWS customers. You contributed SOCI and the fast pull snapshotter upstream to the open source community. What impact are you seeing from other organizations adopting this approach?

Phil Estes: So as I said, it's kinda neat that there's a whole community of folks, researching this area and coming up with new improvements creating pull requests to Containerd, using custom snapshotters. And so all this work, is contributed upstream for the benefit of anyone using Core Containerd or using custom snapshotters. And so, we plan to continue that and any continuous, improvements in this area will continue to be available upstream.

Bart Farrell: And, Kubernetes turned 10 almost about two years ago. What do you expect to happen in the next 10 years?

Phil Estes: Great question. I think, it's amazing to see the community around us, here at KubeCon just seems to keep getting bigger. every time we think, "This must be the peak," it seems like there's a lot of folks still trying to figure out how to adopt Kubernetes for very traditional workloads, legacy environments. even yesterday, I talked to a young contributor who his company is just now, moving to containers. And so even though it's been 10 years, there's still so many, new users, new adopters, and so I think there'll continue to be that growth. And then I think projects will continue to come around Kubernetes as a core to make it easier and simpler for people to adopt and use and operate. And of course, AI is gonna integrate into that world as well with ways to operationally make it easier to use and manage Kubernetes.

Bart Farrell: And Phil, what's next for you?

Phil Estes: What's next for me? So, I have sort of this fun role where I have the ability to influence internal teams within AWS, to talk to engineers about getting more involved in community, contributing upstream, and also helping AWS as a whole figure out how to approach developers, how we make the platform easier to use for developers. And so it's a nice mix of my open source roles and influence and external activity so that people can understand how to use AWS.

Bart Farrell: And for people who want to get in touch with you, what's the best way to do that?

Phil Estes: So I'm on LinkedIn, that's probably the easiest. you can easily search for my name and connect with me. And I'm also on all the CNCF and Kubernetes Slack communities. You can always reach me there as well.

Subscribe to KubeFM Weekly

Get the latest Kubernetes videos delivered to your inbox every week.

or subscribe via