Skip to content
A
No. 23AzureDec 5, 202510 min read

Air-Gapped Azure OpenAI With Private Endpoints: A Terraform Module That Actually Works

"Air-gapped" is a strong word for something running in a public cloud, but it's the right word for what regulated customers want: an Azure OpenAI deployment whose only network path is through their own VNet, with public access ful…

"Air-gapped" is a strong word for something running in a public cloud, but it's the right word for what regulated customers want: an Azure OpenAI deployment whose only network path is through their own VNet, with public access fully disabled, audit logging on, and authentication via their own Entra tenant.

This is the Terraform module we use to provision exactly that, and the design decisions that aren't obvious from the Azure docs.

What "air-gapped" means in practice

Three properties, all of which must hold:

  1. No public network access. The AOAI resource cannot be reached from outside the customer's VNet, full stop. Not "default deny but allow specific IPs." Just disabled.
  2. All traffic over private endpoint. The customer's workloads talk to AOAI via a Private Endpoint that lives in their VNet. The DNS resolution points at the private IP.
  3. Auditable. Every call generates an entry in an audit log that the customer can read and retain.

If any of these is missing, you're not air-gapped — you have a normal AOAI resource with extra steps.

The Terraform module

# main.tf

resource "azurerm_cognitive_account" "aoai" {
  name                = var.name
  location            = var.location
  resource_group_name = var.resource_group_name
  kind                = "OpenAI"
  sku_name            = var.sku_name

  custom_subdomain_name         = var.name
  public_network_access_enabled = false
  local_auth_enabled            = false

  identity {
    type = "SystemAssigned"
  }

  network_acls {
    default_action = "Deny"
  }

  tags = var.tags
}

resource "azurerm_private_endpoint" "aoai_pe" {
  name                = "pe-${var.name}"
  location            = var.location
  resource_group_name = var.resource_group_name
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "psc-${var.name}"
    private_connection_resource_id = azurerm_cognitive_account.aoai.id
    is_manual_connection           = false
    subresource_names              = ["account"]
  }

  private_dns_zone_group {
    name                 = "default"
    private_dns_zone_ids = [var.private_dns_zone_id]
  }
}

resource "azurerm_cognitive_deployment" "deployments" {
  for_each = var.deployments

  name                 = each.key
  cognitive_account_id = azurerm_cognitive_account.aoai.id

  model {
    format  = "OpenAI"
    name    = each.value.model_name
    version = each.value.model_version
  }

  sku {
    name     = each.value.sku_name
    capacity = each.value.capacity
  }
}

resource "azurerm_monitor_diagnostic_setting" "aoai_diag" {
  name                       = "diag-${var.name}"
  target_resource_id         = azurerm_cognitive_account.aoai.id
  log_analytics_workspace_id = var.log_analytics_workspace_id

  enabled_log {
    category_group = "audit"
  }

  enabled_log {
    category_group = "allLogs"
  }

  metric {
    category = "AllMetrics"
  }
}

Three things to note:

public_network_access_enabled = false is the core of the air-gap. Without this, even with a private endpoint, the public endpoint is still reachable. The Azure portal lets you set this; Terraform supports it directly.

local_auth_enabled = false disables API key authentication entirely. The resource only accepts Entra ID-authenticated callers. No keys to leak.

network_acls { default_action = "Deny" } is belt-and-braces with public_network_access_enabled = false. Both should be set. Deny default + private endpoint = the only way in is via the PE.

The DNS trap

Private endpoints don't work without proper DNS resolution. The private DNS zone for Azure OpenAI is privatelink.openai.azure.com. The PE creates an A record in that zone pointing at its private IP.

The trap: every consuming workload's network must be able to resolve <your-aoai-name>.openai.azure.com to the private IP. By default, this resolves to the public IP, which is now firewalled and unreachable.

Three ways to make DNS work:

Option A: VNet-link the private DNS zone to every consumer VNet. Simple if you have one or two VNets. Painful for hub-and-spoke topologies with many spokes.

Option B: Centralize DNS via Azure Private DNS Resolver in your hub. Spokes resolve via the hub. We use this; cleaner at scale.

Option C: Hard-code DNS via local hosts files or per-app DNS overrides. Don't do this.

Our module takes the private_dns_zone_id as input and assumes the caller has wired up the resolution layer. We don't try to manage it inside the module because the right answer depends on the customer's network topology.

What the module deliberately doesn't do

Network policy on the PE subnet. The subnet that holds the PE itself needs private_endpoint_network_policies_enabled set appropriately on the subnet, but that's the responsibility of the team owning the VNet, not the AOAI module. We assume the input subnet is correctly configured.

Customer-managed encryption keys. We support this in a different module variant for customers who require it. Most don't. We left it out of the base module to keep the surface clean.

Multi-region failover. Some customers want a primary in one region and a failover in another. We provide that as a wrapper module that calls this one twice with a Front Door / Traffic Manager in front. The base module is single-region.

The deployment-side concerns

Provisioning the resource is half the battle. Consuming it correctly is the other half.

Workloads must auth via Entra. With local_auth_enabled = false, the only path is managed identity or service-principal auth. We covered this in detail in the previous article in this series. Apps using API keys will fail immediately.

SDK must be configured to use the Entra path. The Azure OpenAI SDK has a token-provider parameter for this:

from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint="https://your-aoai.openai.azure.com",  # public DNS,
    # but resolves to private IP via your DNS setup
    azure_ad_token_provider=DefaultAzureCredential().get_token,
    api_version="2024-08-01-preview",
)

The endpoint URL is the public-facing form. The DNS layer translates it to the private IP. The SDK doesn't know or care about the private endpoint — it just sends HTTPS to the resolved IP.

Workload identity must have the right role. Cognitive Services OpenAI User for inference, Cognitive Services Contributor for management. We grant the first to runtime workloads and the second to platform engineers (with PIM-gated access).

What I'd add next

A drift-detection module. Things drift. A platform engineer in a hurry could re-enable public network access via the portal. We'd want a Terraform-managed periodic check that asserts the resource still has all the air-gap properties set. Currently we use Azure Policy for this; could be made more explicit.

**Per-deployment audit. ** AOAI's diagnostic logs are at the resource level. For an air-gapped resource with multiple deployments serving different teams, fine-grained per-deployment audit would be useful for cost and compliance attribution. Not directly possible today; a feature request to Microsoft.

I would NOT skip the DNS verification step in tests. After provisioning, our module's test harness runs a nslookup from inside the customer's VNet to confirm the private IP resolves correctly. Found three configuration mistakes in our first month of using the module that the test caught immediately.

The portable lesson

The air-gapped AOAI configuration is not a single feature flag — it's a combination of resource settings, network design, DNS configuration, and identity configuration that all have to line up. The Terraform module above bundles the resource-side concerns. The network and identity sides are responsibilities you need to plan for separately.

If you're building this for a regulated customer, plan for a 2-3 week timeline: a few days to provision, a week to wire DNS and identity, and a week to validate that nothing is silently using the public endpoint or API keys.

The good news: once it's done, it stays done. The air-gapped configuration isn't fragile in the same way that, say, custom JWT validation is fragile. The resource's properties are durable; what you have to watch for is humans clicking things in the portal that re-open access.

Azure Policy is the friend that catches that.

TerraformPrivate EndpointsAzure OpenAI

Conversation

Reactions & comments

Liked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.

Loading conversation…

More from Azure

See all →