No. 13Azure AISep 27, 202512 min read

The Real Security Checklist for Enterprise RAG on Azure

Most enterprise RAG security writing is one of two things: a marketing-shaped overview ("Azure has Entra ID and Private Endpoints") or a step-by-step that gets the easy stuff right and ignores the hard stuff. This is the checklist we use, ordered by how much pain it saved us when we hit each item.

It's not exhaustive. It's the items that have actually mattered in our environment, with the gotchas the docs underplay.

1. Private endpoints on every Azure OpenAI deployment, no exceptions

az network private-endpoint create \
  --name pe-aoai-prod \
  --resource-group rg-prod-eus-platform \
  --vnet-name vnet-prod-eus-spoke \
  --subnet snet-aoai-prod \
  --private-connection-resource-id $AOAI_RESOURCE_ID \
  --connection-name aoai-conn \
  --group-ids account

The non-obvious part: Azure OpenAI's public endpoint stays available even when you have a private endpoint. You have to explicitly disable public network access on the resource:

az cognitiveservices account update \
  --name aoai-prod-eus \
  --resource-group rg-prod-eus-platform \
  --custom-domain aoai-prod-eus \
  --public-network-access Disabled

We discovered this the hard way during a security review. Public endpoint enabled by default; private endpoint added as an additional path; the public path keeps working unless you explicitly close it. This is in the docs but easy to miss.

Verification: From a non-VNet location, curl https://aoai-prod-eus.openai.azure.com/openai/deployments should fail with a network error, not 401. If it returns 401, the endpoint is reachable; you've only blocked unauthenticated traffic, not the public network path.

2. Managed identity for every service-to-service call

No API keys in production. Ever. Even with Key Vault rotation, an API key in environment variables is one process dump away from being copied.

For an Azure Container App calling Azure OpenAI:

# Enable system-assigned managed identity on the container app
az containerapp identity assign --system-assigned -n my-app -g rg-prod

# Grant the identity Cognitive Services OpenAI User role
az role assignment create \
  --role "Cognitive Services OpenAI User" \
  --assignee $APP_PRINCIPAL_ID \
  --scope $AOAI_RESOURCE_ID

In code:

from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=AOAI_ENDPOINT,
    azure_ad_token_provider=DefaultAzureCredential().get_token,
    api_version="2024-08-01-preview",
)

No API key. No secret rotation. No leaked-credential incident class. Token expiry is handled by the SDK transparently.

3. Network isolation of the vector store too

People remember to lock down Azure OpenAI and forget to lock down Azure AI Search. Anything that holds your indexed enterprise content is as sensitive as the model that queries it.

az search service update \
  --name search-prod-eus \
  --resource-group rg-prod-eus-platform \
  --public-network-access disabled

Plus a private endpoint, plus the same VNet integration as Azure OpenAI. The model and the search service should communicate over private network only.

Gotcha: The Azure portal's "Index management" UI requires a network path to the search service. If you've fully privatized it, your platform engineers can no longer manage indexes from the portal unless they're on the VNet. We added a jump-box pattern with Bastion access. Annoying, necessary.

4. Conditional Access policies on the developer-side identities

Engineers and platform users access AOAI for development, debugging, and direct query. Their human identities need protection too.

In Entra Conditional Access:

Require MFA for any access to Cognitive Services resources.
Block access from countries you don't operate in.
Require compliant device for any role with */write permissions on AI resources.

This catches the "stolen laptop" and "credential phishing" attack classes. The Azure OpenAI resource-level RBAC is necessary but not sufficient.

5. Network security between layers

Even inside the private VNet, traffic between the application tier and AOAI should be controlled. NSGs on the AOAI subnet that only allow inbound from the application tier subnet:

az network nsg rule create \
  --resource-group rg-prod-eus-platform \
  --nsg-name nsg-aoai-prod \
  --name AllowAppTierToAOAI \
  --priority 100 \
  --source-address-prefixes 10.20.0.0/24 \
  --destination-port-ranges 443 \
  --access Allow \
  --protocol Tcp \
  --direction Inbound

# Plus a default-deny rule
az network nsg rule create \
  --resource-group rg-prod-eus-platform \
  --nsg-name nsg-aoai-prod \
  --name DenyAllElse \
  --priority 4096 \
  --access Deny \
  --protocol "*" \
  --direction Inbound

This prevents lateral movement: if a different workload in the VNet is compromised, it can't talk to AOAI without explicit allowlisting.

6. Content filter configuration, not just defaults

Azure OpenAI ships with content filters. The defaults are reasonable for general chat. They are not specific to your use case.

If you're doing RAG over internal docs, the default content filter may flag legitimate content as "violence" or "self-harm" because the underlying corpus mentions, e.g., security incidents or workplace safety topics. We tuned ours per-deployment:

Customer-facing chatbot: stricter filters, refusal on edge cases
Internal employee assistant: looser filters, explicit policy that the model should respond to questions about workplace incidents

The content filters are configured at the Azure OpenAI resource level via the Content Filter API. Worth spending a day matching them to your domain.

7. Audit logging that you actually parse

Diagnostic settings on the Azure OpenAI resource → Log Analytics workspace. Enable AuditEvent and RequestResponse categories.

The non-obvious part: AOAI logs include the full prompt and response. That's:

Useful for debugging
Potentially a PII concern (depending on what users send)
Subject to retention policies that may differ from your application's

We mask sensitive fields before logging where possible (in the application layer, before the request reaches AOAI), and configure the diagnostic logs to a Log Analytics workspace with a 90-day retention. Anything older goes to immutable storage with stricter access control.

8. Per-tenant request budgets

For multi-tenant applications, a single tenant can monopolize your AOAI quota by accident or malice. We enforce per-tenant token budgets at the API Management layer:

<inbound>
  <rate-limit-by-key calls="1000"
                     renewal-period="3600"
                     counter-key="@(context.Request.Headers.GetValueOrDefault("X-Tenant-Id"))" />
  <azure-openai-token-limit tokens-per-minute="50000"
                             counter-key="@(context.Request.Headers.GetValueOrDefault("X-Tenant-Id"))" />
</inbound>

The azure-openai-token-limit policy is a recent addition that's specifically token-aware (counts both prompt and completion tokens). Worth enabling regardless of multi-tenancy concerns — it's a useful guardrail against runaway agents.

9. Data residency on the deployment, not just the resource

Azure OpenAI deployments have a dataResidency property that controls where prompt/completion data is stored for fine-tuning lookback. By default it follows the region. For regulated workloads we explicitly set it:

az cognitiveservices account deployment create \
  --name aoai-prod-eu \
  --resource-group rg-prod-eu \
  --deployment-name gpt-4o-eu \
  --model-name gpt-4o \
  --model-version "2024-08-06" \
  --model-format OpenAI \
  --sku-name Standard \
  --capacity 100

EU deployments stay in EU. US deployments stay in US. For our regulated clients, this is a procurement-time question and we have to answer "yes."

10. The thing the docs almost never mention

Test all of the above periodically with an automated security audit.

We have a CronJob (boring Python, ~150 lines) that runs weekly and asserts:

All AOAI resources have public network access disabled
All AOAI resources have private endpoints
All AOAI deployments have content filters configured
All applications using AOAI authenticate via managed identity (no API keys in environment)
All audit logging is enabled
All NSGs have the expected rules

Anything that fails the audit files an issue in our security backlog. We've caught two regressions where someone added a new AOAI resource without all the controls. The audit caught both within a week.

What I'd add next

A synthetic adversarial test suite that runs prompt-injection and jailbreak attempts against the production system on a schedule. We've talked about this for a year and not done it. The argument against is "it's a moving target." The argument for is "yes, that's why you have to keep testing."

I would NOT skip the network isolation steps for "internal-only" applications. Internal-only is a deployment state, not a security state. Treat AOAI as a sensitive external dependency regardless of who's using it.

RAGSecurityEnterprise

Conversation

Reactions & comments

Liked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.

Loading conversation…

More from Azure AI

See all →

Azure AINov 29, 2025

Killing API Keys in Azure OpenAI: A Managed Identity + Entra ID Migration Done Properly

Six months ago we had 14 services calling Azure OpenAI. All of them used API keys. The keys were stored in Azure Key Vault, fetched at startup, and rotated quarterly via a manual process that "everyone knew was fragile."

9 min read

Azure AINov 8, 2025

Per-Tenant Cost Attribution for Azure OpenAI Traffic Using APIM emit-token-metric-policy

Six months ago I couldn't tell you which tenant of our SaaS application was costing us the most in Azure OpenAI tokens. The number on the bill was real but unattributable.

9 min read

Azure AIOct 25, 2025

Fine-Tuning a Llama Variant With KAITO on AKS, Then Stress-Testing the Inference Endpoint

KAITO (Kubernetes AI Toolchain Operator) on AKS is the smoothest path I've found for "I want to fine-tune a Llama variant on my domain data and serve it as an inference endpoint." The whole pipeline — node provisioning, training, …

10 min read