How I think about it
A while back, AWS put out a framework for securing generative AI apps, and it caught on. People at OWASP, CoSAI and plenty of security teams started using it. But that framework was built for the old style of AI, where you send a prompt, get an answer, and the session ends. Things have moved on. We now have AI agents that run for long stretches, call tools, make their own decisions and touch real systems. That changes the security picture completely, so a new framework was needed. Here is my plain-language take on it.
Why agents are a different beast
With a normal chatbot, the blast radius of a problem is small. If something goes wrong, it goes wrong in one request and one response, then it’s over. Agents break that comfort in a few ways.
They can act on their own. An agent might kick off actions without anyone asking it to, which opens the door to runaway processes or actions nobody intended.
They remember things. Agents often keep memory across sessions, which is useful but also means someone can poison that memory with false information that quietly corrupts decisions for weeks.
They use tools. Agents plug into databases, APIs and other agents. One compromised agent can drag everything it’s connected to down with it.
They reach outside. Internet access, third party APIs, enterprise systems. More reach means more ways for data to leak out or attackers to get in.
There’s a useful distinction worth keeping in mind here: agency versus autonomy. Agency is what the agent is allowed to do, the scope of its permissions.
Autonomy is how independently it can act without a human signing off. An agent can have lots of agency but low autonomy, meaning it could do a lot but needs approval for every step. Or the reverse. You need to manage both.
The four scopes
The framework sorts agentic systems into four levels based on those two dimensions. I’ll use the same example throughout: an agent that helps book meetings.
Scope 1: No agency. The agent is basically read-only. A human kicks it off, it follows a fixed workflow, and it can’t change anything. In the calendar example, it checks your availability and your colleague’s, suggests the best times, and you book the meeting yourself. Security here is mostly about keeping the workflow intact, validating inputs at each step, and making sure the agent can’t escape its lane.
Scope 2: Prescribed agency. Now the agent can make changes, but only with a human approving each meaningful action. This is the classic human in the loop setup. The agent finds the meeting slot and asks “want me to send the invite?” You say yes, it sends. The security focus shifts to protecting the approval process itself. You don’t want the agent finding a way around the human, so think multi-factor auth for approvers, signed approval decisions and proper audit trails.
Scope 3: Supervised agency. A human still starts the task, but the agent finishes it on its own. It finds the slot and just books it, no permission needed. You can step in mid-flight if you want, but you don’t have to. Here you need real-time monitoring, clear boundaries set up front, kill switches for runaway behaviour, and regular checks that the agent is still doing what you actually asked.
Scope 4: Full agency. The agent starts its own work based on what it observes. Say a meeting summarizer notes that six people agreed to a whiteboard session on Friday. The calendar agent spots that, finds availability for all six and books it. Nobody asked it to. This is the highest risk level, and it demands behavioural monitoring, anomaly detection, tamper-proof human overrides, and failsafes that can shut things down. One critical point: the agent must never be able to disable its own oversight.
What actually works
A few patterns show up in successful deployments. Start at Scope 1 or 2 and earn your way up as your security maturity grows. Layer your defences so one breach doesn’t sink everything, and pay serious attention to identity and authorization, because agents with more privileges than their users can become a backdoor for privilege escalation. Build continuous checks that compare agent behaviour against expectations.
And here’s the bit people get wrong: human oversight doesn’t shrink as you climb the scopes. It shifts. At lower scopes you’re approving individual actions. At higher scopes you’re doing far more auditing, validation and monitoring. Less hand-holding, more supervision.
One last idea I really like: graceful degradation. If an agent starts misbehaving, automatically dial its autonomy down. Add approval requirements back in, restrict its actions, or pull the plug entirely. Build that in before you need it, not after.

