Securing AI Agents In The Cloud Without Slowing Delivery

AI agents do not fail because the model is weak, they fail because teams give them too much access, too little grounding, and almost no operational discipline. If I had to reduce agent security to one idea, it would be to treat an agent like software with identity, permissions, logs, tests, and rollback, not like a clever prompt. That lines up with NIST’s AI Risk Management Framework and OWASP’s current guidance for LLM applications, which both focus on lifecycle risk rather than one-time model selection.

Start with identity, not prompts

The first mistake I see is using prompts as a permission system. Prompts are instructions, not controls. If an agent can read documents, call tools, or take action, I want it to have its own identity and its own access boundary.

On Azure, that means using Microsoft Entra Agent ID so the agent can live inside the same governance model as other enterprise identities, with Conditional Access, identity protection, and lifecycle controls.

On AWS, the equivalent mindset is to give the agent only the IAM permissions it needs and, when I need managed agent hosting, use Bedrock AgentCore to enforce permissions, session isolation, and runtime governance.

Put data controls outside the model

A regulated deployment usually gets into trouble through data leakage, not model cleverness. I do not expect the model to protect me from itself. I place controls around it.

On AWS, Bedrock Guardrails can filter harmful content and protect sensitive information in both prompts and responses. Bedrock Knowledge Bases gives me a practical retrieval pattern, so the agent answers from approved enterprise data instead of guessing from pretraining.

On Azure, I would use Azure AI Content Safety for harmful or risky content, then add retrieval through Azure OpenAI On Your Data or Foundry-based patterns so responses stay grounded in designated sources. Microsoft Purview adds governance and compliance visibility across copilots, agents, and other AI apps, which matters when security and legal teams need evidence, not promises.

Choose retrieval before fine-tuning

A lot of teams reach for fine-tuning too early. I usually start with retrieval. If the real problem is private policy, current documents, regulatory text, or fast-changing internal knowledge, retrieval gives me better control, faster updates, and a cleaner audit story.

Token Sprawl Is Getting Worse, and AI Is Pouring Fuel on It

AWS states that Bedrock Knowledge Bases uses retrieval-augmented generation to improve relevance and accuracy with proprietary information.

Microsoft documents the same core pattern in Azure OpenAI On Your Data, which lets me ground responses in designated enterprise sources without training or fine-tuning the model first.

Fine-tuning still matters, but I reserve it for behaviour changes that retrieval alone cannot solve

Evaluate agents like software

Agents are non-deterministic, so testing has to go beyond happy paths. I want evals for groundedness, task completion, tool-call accuracy, refusal behavior, and prompt-injection resistance before I expose the system to real users.

Microsoft Foundry now supports built-in evaluators for quality, safety, RAG-specific metrics, and agent-specific metrics, plus production monitoring and tracing through Application Insights and OpenTelemetry-based workflows. That is the right direction. AWS also supports model evaluation and observability patterns around Bedrock and EKS-based AI workloads. The principle is the same on both clouds: test the agent as a system, not just the model as an endpoint.

Pick the runtime that matches your risk

I do not start with Kubernetes unless I need it and it makes business sense to have it. Managed services usually get me to a safer production baseline faster.

For AWS, I would start with Bedrock, add Guardrails, use Knowledge Bases for retrieval, and move to AgentCore when the agent needs managed runtime and tool orchestration. If I need open models, tighter runtime control, or GPU-level cost tuning, EKS becomes sensible. AWS’s EKS guidance for AI and ML workloads and Karpenter support are useful when I need autoscaling and better node utilization.

For Azure, I would start with Foundry or Azure OpenAI, add Content Safety, use Purview for governance, and rely on Entra Agent ID when the agent needs enterprise identity controls. If I need custom inference infrastructure, AKS supports GPU-enabled node pools and KEDA for event-driven autoscaling, which is valuable when demand spikes but I do not want to pay for idle capacity all day.

Privacy and residency still need design work

Both platforms document strong privacy controls, but I still design carefully. AWS states that Bedrock does not use prompts and completions to train AWS models and does not distribute them to third parties. Microsoft states that prompts, outputs, embeddings, and training data in Azure Direct Models are not available to other customers or model providers and are not used to train foundation models without permission. Those are strong foundations, but they do not replace your own classification, retention, and residency decisions.

The practical lesson is simple. Secure agents by narrowing identity, grounding data, evaluating aggressively, and choosing the lightest runtime that still gives you control. That is how I would move fast without pretending risk disappears because the demo looked good.

Nas Taibi

Building resilient, secure cloud architectures for regulated industries.

Securing AI Agents In The Cloud Without Slowing Delivery

Start with identity, not prompts

Put data controls outside the model

Choose retrieval before fine-tuning

RELATED ARTICLES

Evaluate agents like software

Pick the runtime that matches your risk

Privacy and residency still need design work

More posts

Securing AI Agents In The Cloud Without Slowing Delivery

Token Sprawl Is Getting Worse, and AI Is Pouring Fuel on It

AI Adoption: Moving Past the Bolt-On Phase

Cloud based WAF, a practitioner’s view