Super-Scaling Open Policy Agent with Batch Queries

Host:

Bart Farrell

Guest:

Nicholaos Mouzourakis

This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

Dive into the technical challenges of scaling authorization in Kubernetes with this in-depth conversation about Open Policy Agent (OPA).

Nicholaos Mouzourakis, Staff Product Security Engineer at Gusto, explains how his team re-architected Kubernetes native authorization using OPA to support scale, latency guarantees, and audit requirements across services. He shares detailed insights about their journey optimizing OPA performance through batch queries and solving unexpected interactions between Kubernetes resource limits and Go's runtime behavior.

You will learn:

Why traditional authorization approaches (code-driven and data-driven) fall short in microservice architectures, and how OPA provides a more flexible, decoupled solution
How batch authorization can improve performance by up to 18x by reducing network round-trips
The unexpected interaction between Kubernetes CPU limits and Go's thread management (GOMAXPROCS) that can severely impact OPA performance
Practical deployment strategies for OPA in production environments, including considerations for sidecars, daemon sets, and WASM modules

Relevant links

Transcription

Bart: In this episode of KubeFM, I got a chance to speak to Nicholaos, who discusses how his team re-architected Kubernetes native authorization using Open Policy Agent (OPA) to support scale, latency guarantees, and audit requirements across services.

You'll hear us talking about their move to batch evaluation to reduce OPA decision latency under high concurrency, how CPU throttling and Go garbage collection impacted performance under Kubernetes resource limits, and the trade-offs between OPA as a sidecar versus daemon set versus WASM-based integration.

Nicholaos also explains how they worked with Styra to improve audit log indexing and Elasticsearch for better queryability and retention, and how they built out a custom observability layer to track OPA decision timing, cache efficiency, and failure modes.

As Nicholaos pointed out in the episode, the hard part isn't writing policy. It's making sure they don't break when you scale them to 20,000 requests per second. Whether you're building secure service-to-service auth or trying to scale policy enforcement with real observability, this episode digs into what that actually takes in production.

This episode of KubeFM is sponsored by Learnk8s. Since 2017, Learnk8s has helped engineers all over the world level up through in-person and online courses. Courses are instructor-led and can be taught to individuals as well as to groups. Students have access to the course materials for their entire lives, and courses are 60% practical and 40% theoretical. For more information, go check out LearnK8s.io.

Now, let's get into the episode. Welcome to KubeFM. What three emerging Kubernetes tools are you keeping an eye on?

Nicholaos: Currently, I'm keeping an eye on Open Policy Agent, FluxCD, and Traefik.

Bart: Good. Is there a particular reason?

Nicholaos: Open Policy Agent is interesting for probably the obvious reason that we're going to be talking about today. I've been working with it for quite a few years now. It just has a massive amount of flexibility and possibilities into the future. I would love if I could make a firewall out of it, for instance. I like Flux, because it's simple and easy to use. And Traefik, because we're currently using it on a project that was built by someone I respect greatly who is no longer with Gusto, unfortunately.

Bart: Okay. Now, for people who don't know you, what do you do and where do you work?

Note: In this case, there are no specific technical terms that require hyperlinking. However, I noticed the company name "Gusto" could be linked to its corporate website, so I would recommend:

Okay. Now, for people who don't know you, what do you do and where do you work at Gusto?

Nicholaos: I am a staff product security engineer at Gusto. I work primarily on authorization. As an HR payroll benefits company, I focus on improving authorization with Open Policy Agent, which uses logic programming to define authorization policies.

Bart: And how did you get into cloud native in the first place?

Nicholaos: After my time in Montreal, which we can talk about later, I was introduced to cloud at the Lumen Cloud Computing Center (previously called CenturyLink). I moved from Montreal to St. Louis after school and after my games career. I gained both backend and frontend experience at the company, working with the billing team on backend tasks and DDoS mitigation for the frontend.

Bart: And before becoming cloud native, what were you? I know you alluded to working in games, but can you elaborate a bit more on that?

Nicholaos: I am from New Hampshire originally, where I'm coming to you from now. As many tech enthusiasts are, I was very into games growing up and made it a personal goal: I don't care how hard it gets, I'm going to go into games and keep studying hard.

In Montreal, it's a pretty big game city because of tax policy from the Quebec government, which I think they're phasing out now, unfortunately. This is why you have a lot of big studios up there like Ubisoft, EA, and Eidos, where I worked. Throughout school, I studied really hard because everyone wants to go into games—it's basically the gateway drug into tech. I gained a great baseline knowledge of low-level technology, starting from assembly and going into C and C++. Games is one of those industries that still uses C++ because they want to squeeze every ounce of speed they can out of hardware.

My wife moved to St. Louis for medical school, and I went along with her. The places I worked were Eidos, where I worked on Deus Ex: Mankind Divided, and before that, Square Enix Montreal, working on Hitman Sniper, a mobile Hitman game. I was also involved in founding a studio that worked on a game called Last Year: The Nightmare, an indie studio that didn't end the best way—we can get into that later if you want.

Bart: Okay, and the Kubernetes ecosystem moves very quickly. How do you stay updated?

Nicholaos: I'm going to be 100% honest: I'm not a Kubernetes expert by any means. I keep up to date by reading the documentation to find the things I really need to know about. I keep up with recommendations from the company and the team, read up on them when they come up, and occasionally watch a KubeCon talk or two.

Bart: If you could go back in time and give one career tip to your younger self, what would it be?

Nicholaos: So there are a couple of things. For a more general audience, I would reference the blog post about the law of leaky abstractions by Joel Spolsky, who I think was the CEO of Stack Overflow for quite a while. I'm a fan of his work—most things he writes are really insightful when it comes to tech and the business it interacts with. That's where we have the most fun as engineers, in the clash of priorities.

In my opinion, the law of leaky abstractions is the one thing that will separate talented engineers from AI, at least for a while. We'll see—I'm not an expert, and these things are unpredictable.

The law of leaky abstractions says that all abstractions leak, whether it's TCP, a higher-level programming language, or any two pieces of software interacting with each other. When they leak, we realize they save us time working but not time learning. To me, that is a profound insight I wouldn't have gotten in any class. It's motivation to pay attention in class because there's usually a reason you're learning what you're learning.

Personally, I would advise myself to understand and be honest about my strengths. For example, I wanted to create a compelling video game from scratch, with no help. I was a good UI designer and engineer, but you need to be a great artist, modeler, level designer, composer, and low-level engine developer if you don't want to use existing tools.

I always laugh when I see posts about someone creating a game entirely by themselves. Did they code the Unreal Engine from scratch—15 million lines of C++? I'm not taking away from their accomplishment, but as you go deeper into any industry, you realize how many people are necessary to build something from the ground up.

The advice is: know your strengths, recognize where they are, and for your weaknesses, try to find friends with complementary strengths so you can build a team and make your dreams come true.

Bart: As part of our monthly content discovery, we found an article that you wrote titled "Super Scaling Open Policy Agent with Batch Queries". The next questions are designed to explore the topics covered in the article further. Authorization is a critical concern for Kubernetes administrators. But before we get into implementation details, could you explain why traditional authorization approaches often fall short in containerized microservice architectures?

Nicholaos: This is a big topic, touched on in the previous blog post linked at the beginning. Looking at the 2021 OWASP top 10, the upcoming 2025 version will likely show similar trends. In 2021, the number one vulnerability for web apps was broken access control, essentially broken authorization.

The main reason is that authorization is often very fragmented, especially in multi-service architectures, and even more so in multi-language, multi-service environments. Broadly speaking, before OPA, there were two approaches to authorization: code-driven and data-driven.

In the code-driven approach, authorization is coded directly into endpoint code, which works fine for prototypes and small projects. However, as you grow, you realize you can't represent all policies, particularly in complex domains like HR, payroll, and benefits. You encounter problems with more endpoints, models, data, users, access levels, and potentially multiple services and languages.

Additionally, this approach is typically tightly coupled with the rest of the code, which is not ideal. Before OPA and service decoupling, the alternative was data-driven authorization, primarily role-based access control (RBAC).

In RBAC, everything is data-driven: users have one or more roles and can log in with those roles. The authorization system maps roles to accessible resources. While this works for many scenarios, it falls short when complex logic is required. You might need to create special roles or mix authorization styles, combining data-driven and code-driven approaches.

This creates significant audit challenges. A security team trying to understand policies must review all services, hoping the policies don't change during the audit process. It becomes overwhelming and complex.

These limitations in traditional authorization approaches ultimately led to the development of solutions like OPA.

Bart: Now you've advocated for Open Policy Agent (OPA) as a Kubernetes native solution for authorization. How does OPA's architecture integrate with Kubernetes? And what makes it particularly well-suited for container orchestration environments?

Nicholaos: What's best about Open Policy Agent (OPA) is its versatility. It depends on how deep and scalable you want your authorization to be, both for Kubernetes and application authorization. I've implemented only application authorization. One OPA node running by itself can handle a lot of authorization requests and is remarkably performant.

If you're willing to do a time-space trade-off by optimizing policy bundles (which uses more RAM but takes less time to evaluate), you shouldn't need much more. The bottleneck is more likely to be network latency between OPA and your service. If policies take a long time because they're complicated, you can make them faster. For super scaling behavior, you can stand up multiple OPA instances, even set up an auto-scaling group.

You can bake policies into the image or have OPA fetch them periodically from an external source. For low latency, you can co-locate OPA and server pods on the same node or run it as a WASM module directly on your server pod.

Regarding Kubernetes authorization, Rego (the logic programming language for policies) is purpose-built to be efficient in reading, writing, and evaluating data-driven documents like Kubernetes YAML, which serializes trivially to JSON. OPA can act as a Kubernetes admission controller, making arbitrary policies easy to write and deploy. It takes a JSON document for authorization, performs a logic programming policy evaluation, and returns results declaratively.

The system has no side effects and is built for evaluation. You can't have recursion or infinite loops, so you're guaranteed to return something. There's a way to do recursion in a data-driven manner, which I detail in my last blog post.

Bart: Let's talk about deploying Open Policy Agent (OPA) in Kubernetes. What are some common deployment patterns, and what considerations should teams make when choosing between sidecar, daemon set, or other deployment options?

Nicholaos: The main patterns include individual pods, replica sets, auto-scaling groups, daemon sets, sidecars, and WASM modules. With all deployments, the key considerations are:

Latency requirements
Bandwidth
Development overhead
Desired features

For example, WASM modules don't have all standard library functions. If you look at the OPA documentation, not all standard functions are compatible with WASM. Additional factors include cloud cost and policy size.

If you can afford the RAM and need latency savings, optimized policy bundles can be beneficial. It's crucial to have good observability into which parts of the OPA authorization request take the most time.

For deployment, perform napkin math by simulating load to observe CPU and memory usage. On your local machine, start hitting OPA with expected request types, measure the response, and log the frequency of requests per second, minute, and hour.

Consider round-trip latency. In co-located pods (OPA and server pods on the same node), you can expect a couple of milliseconds of latency. We tried using Unix sockets to improve speed, but it doesn't help significantly. If you want better latency than on-node performance of a few milliseconds, you'll likely need to use WASM modules.

Bart: Now in your article, you mentioned performance challenges with Open Policy Agent and Kubernetes when handling batch authorization requests. Could you explain the specific issues you encountered in your cluster?

Nicholaos: We didn't have much of a performance bottleneck. We had batch policies in OPA, which were quite performant before I started this project. However, it would be easy to imagine a scenario where someone just starting with OPA could run into a situation where they perform one authorization query for every item in a long list, and that network latency would add up, potentially resulting in seconds of delay on a particular request.

In our case, we couldn't search the batch policies audit logs. OPA has an advantage over traditional authorization methods with built-in audit logs that can be exported to services like S3 or Styra's DAS service (which, despite its name, doesn't seem to stand for anything particularly relevant).

These logs get indexed, and you can search them with an Elasticsearch-type query. However, with batch logs, only one large audit log gets sent to DAS for hundreds of queries, which can't be efficiently indexed in that system.

For these unsearchable batch policies, we wanted to maintain the parallel performance while separating the decision logs into individual, indexable logs that could be queried in DAS—which is what initiated my project.

Bart: You discovered some pretty interesting interactions between Kubernetes resource limits and Go's runtime behavior. This gets into some advanced territory. Can you explain how CPU limits in Kubernetes affected Opus performance?

Nicholaos: Right. This is a classic example of false assumptions in software engineering, also known as "the road to hell is paved with good intentions". For some quick background, to solve the Styra DAS issue, Styra graciously added a feature to split batch authorization requests into individual ones, evaluate the requests in parallel, send individual decision logs to DAS, and then stitch them back together for the response.

This effectively took part of the whole operation out of Rego land with our old batch policy and performed it in Golang, the language Open Policy Agent (OPA) is implemented in. This gets into advanced territory. While I don't know the exact details of OPA evaluation in logic programming, my understanding is that it can make assumptions that make evaluation easier than in an imperative language like Go.

It was working great on my local machine, but once we got it into the production cluster, we were seeing considerable slowdown. Even after trivial optimizations like reducing request size, certain request properties were increasing latency by 10, 15, or even 100 times. We compressed repeated data to reduce not just network latency, but also processing latency from serialization and deserialization.

Initially, we thought it was a garbage collector issue due to OPA's timing. We tried tweaking Go's garbage collector values with no luck. Eventually, I discovered it was a mismatch in expectations between Go and Kubernetes.

Go is designed to be highly parallelizable via goroutines that run on threads. The key question for Go's designers is how many threads to spawn for the developer's goroutines. Too few threads can lead to what might be called "context switch thrashing" - where context switches between threads consume more time than actual execution.

Go's default approach is to spawn threads equal to the CPU cores reported by the operating system. However, this doesn't work the same way in Kubernetes when CPU resource limits are enabled. Since OPA is co-located with server pods, we wanted to be judicious about CPU limits to prevent it from hogging resources.

With our 750 millicores limit, Kubernetes would only allow running on 75% of one core, even if multiple physical cores exist. Go was spawning eight threads on an eight-core machine, but all were fighting over a single core. When I adjusted GOMAXPROCS from eight to two, we saw performance improve to about one-fifth of what we expected from local tests and earlier batch policies.

Bart: After discovering this issue, you mentioned benchmarking different configurations. What was your methodology for optimizing Kubernetes? And what final configuration did you settle on?

Nicholaos: Right. Since our variable space wasn't particularly huge, I originally tested setting GOMAXPROCS from one through four on one through four CPUs for a total of 16 tests. I wrote a bash script to go through all those possibilities and ran a sample batch authorization request through 30 repetitions each. Then I collected the metrics that OPA generously reports and aggregated them into averages, and I think I may have done percentiles, or min-max, average—basically the main things you're concerned with.

I put that onto a spreadsheet in Numbers because I'm too cheap to pay for Excel. I'm sure there will be folks who will explain why I'm making the wrong decision and that I need to find the light of regular Excel. It is what it is.

After presenting my findings to the infrastructure team, who is understandably very protective of CPU limits and needed convincing, we landed on three Go Max procs on 3,000 millicores. Even though OPA uses maybe a tenth of that in practice, it unlocked three physical cores for three Go threads to run on. If you look at the graph, that's where you start to get diminishing returns—right around three. It's never guaranteed what the correct number is for a particular number of threads or CPUs. Some metrics like min, max, or average can be off because thread scheduling is a very unpredictable process.

We landed on three cores, three Go threads because we didn't want to give OPA too high of a CPU limit in case it started going rogue, but at the same time, we wanted to maintain similar performance to what we had before.

Bart: Let's talk about scaling Open Policy Agent in a Kubernetes cluster. How did batch authorization improve your overall cluster efficiency, and what metrics demonstrate the impact?

Nicholaos: Batch authorization queries are more efficient because you can pack more of them into a single HTTP request. Even when you don't leave the node, network latency can be significant. For us, this was a real killer. One or two milliseconds can become problematic as authorization requests increase, depending on your use case.

The more authorization requests you can fit into a single round-trip network call, the more efficient it becomes. We were able to push through about 18 times more batch requests than single requests in the same amount of time on average. This assumes that authorization requests can be batched, which is not always the case. However, the cost of adding a request to an existing batch is negligible and essentially free, unless you have ultra-low latency requirements.

It's worth noting that authorization requests can be dependent on one another. If you wouldn't perform one batch request if the previous one fails, you might consider letting them all succeed through the request, batching them together. At the end of the request, you send it to Open Policy Agent (OPA) and then examine all the results. If all results are true, you can return an authorization.

Be cautious with this approach, especially with mutations. If you assume previous authorizations succeeded and perform mutations, you could cause damage before returning the authorization. This method works best for read-only requests. In the worst-case scenario, you might allow all authorizations through, potentially creating an opportunity for a larger DDoS attack.

While attackers would likely need authentication before this authorization chain, you should carefully consider batching based on your specific circumstances.

Bart: Authorization decisions need to be auditable for compliance and security. How did you handle decision logging in your Kubernetes environment, and what challenges did you face?

Nicholaos: OPA Styra, the creator of OPA, has a convenient service called DAS (Declarative Authorization Service). OPA can export its decision logs with a simple config line. This service ingests decision logs from all OPA instances, indexes them, and makes them searchable in one location. You can search for actions taken by a specific user, users with access to a particular resource, and details about when and how access was granted.

We encountered an issue with legacy batch decision logs that were not easily searchable, but the new batch API from Styra resolved this. The service also allows you to link your GitHub repository containing policies and simulate changes against inputs in your decision log. You can review how a policy change would affect previous decisions. It offers robust authoring capabilities overall.

Bart: For Kubernetes administrators looking to implement Open Policy Agent (OPA) in their clusters, what practical advice would you give based on your experience?

Nicholaos: It depends on the stage of development and the resources you can allocate to Open Policy Agent (OPA). Here's practical advice:

First, get familiar with Rego, which is a logic programming language different from imperative languages. Styra has excellent documentation, and the Styra Academy offers great resources. The Rego Playground is particularly useful for testing policies by copy-pasting JSON inputs and evaluating outputs in real-time. It also includes linting features.

Get buy-in from your team, especially those who will read or write the policies. If you're using OPA for data-driven authorization, explain the benefits and help them understand that while it's an investment, there's a legitimate security payoff. The challenge is significant because authorization is a complex, human-centric problem—it's the number one vulnerability on the OWASP top 10 list.

Assess the scale and latency requirements for your OPA deployment. Use observability tools like Datadog to aggregate data and understand your authorization needs. You can deploy OPA pods via sidecar, daemon set, or standalone deployment depending on your infrastructure.

For configurations and policies, you have multiple options:

Bake configs into Docker images
Use config maps
Pull policies from services like S3 or Styra DAS

When creating policy bundles, be cautious with optimization levels. We've encountered bugs with level two optimization. As a precaution, run tests with different optimization levels to ensure consistent results, using decision logs and sample inputs from Styra DAS.

Bart: Earlier in the conversation, you mentioned the book, The Law of Leaky Abstractions. It seems particularly relevant in Kubernetes environments. Given that there are so many layers from container runtime to orchestration to application frameworks, what lessons can Kubernetes administrators take away from your experience?

Nicholaos: So I'm going to refer back to the words of the blog post I mentioned earlier, because they're extremely profound to me. Every computer science student I advise is told to read the same blog post, because if you can understand it, you'll have the motivation to study what you need to and be far ahead of everyone else.

The words are: "Abstraction saves us time working, but does not save us time learning." Basically, every lower-level technology or abstraction that you don't understand beneath the primary one you're using incurs a learning debt, as opposed to a strict technical debt. You don't pay that debt until the abstraction leaks or breaks in some way, and you have to go down the tech stack to find the source.

The more you know about the layers you're building on, the less time it will take to find that leak. I draw a parallel to the show House MD. I was a huge fan back in the 2000s and have probably watched the full series too many times. It's like a modern medical play on Sherlock Holmes, where Dr. House is presented with weird medical cases that no other doctors will attempt to solve.

He often uses incomplete diagnostically relevant information under tight medically imposed deadlines. This is not unlike Kubernetes, which has many layers. For security reasons, many Docker containers nowadays don't have a shell. You really have to think deeply about the entire stack and the information you have at hand—comparing local environment to Kubernetes environment—to solve an issue without needing a sabbatical to learn about the lower layers.

It's important to know what you're running on. It's never a bad idea to go as far down the stack as you want. Reading Linux kernel code can only help you. Knowing how these layers interact is crucial, especially from a security perspective. Not understanding how layers interact can leave you vulnerable to injection attacks or buffer overflows. If you don't understand how these things happen, you'll be at a real disadvantage in detecting intrusions and vulnerabilities, especially if someone with significant resources and motivation targets your system.

Bart: Let's wrap this up. I noticed from your social media profiles that apart from Kubernetes and authorization systems, you have a diverse range of activities. You previously mentioned game design, and I know you're also interested in climbing and marksmanship. How do these various interests influence your approach to solving complex technical problems like the OPA optimization you've described?

Nicholaos: So I don't know that they're strictly related, but I had this teacher in high school who I greatly respect and still visit occasionally. His name is Mr. Scott Gelzer at Coe-Brown Northwood Academy in [Northwood, New Hampshire](https://en.wikipedia.org/wiki/Northwood,_New Hampshire). Shout out to Scott.

He was talking about old physicists and said that what separated them—what enabled them to solve really complex problems—was their ability to sit and think about one problem for days, weeks, or even years on end. I think that's a bit extreme. I like to have a dichotomy where it's important to have the ability to do that, but also to recognize when you're not making progress.

This means avoiding getting stuck in a pattern or circle, and instead pushing away from the desk, getting some sleep, going outside, or doing a different activity. For myself, I always find that while not actively thinking about the problem, my mind creatively explores different directions—like what if we try this or that. By the next day, when I sit down again, I usually have another direction I can take.

I think being able to combine both abilities is critical to solving complex problems and will be a more efficient use of time in the end.

As for what's next, my wife is finishing her residency here in New Hampshire. We're going to be moving to Missouri for a year before moving to the West Coast, hopefully for good, a year after that. I'm not looking to make any big career moves for a little bit and will probably be sticking with Gusto for the foreseeable future. This gives me time to hold out for a possible IPO, and I'm sure new fun problems will pop up to provide opportunities for new blog posts.

Bart: So, how can people get in touch with you?

Nicholaos: I don't have a social media presence because if I did, I would inevitably say something non-technical and political. Some people would not like it, and I just don't want to invite that kind of conflict into my life, especially right now. Maybe one day in the future, I will have a social media account. But for now, LinkedIn is the best way. Send me a request with a message. I'll probably read it, as long as it doesn't sound like an advertisement on first glance.

Bart: Thank you very much for joining us today and for sharing your knowledge about Open Policy Agent. We'll talk to you in the future. Take care. Have a good one.

Nicholaos: Thank you for having me. Bye-bye.

Listen anywhere