Skip to content
damionas
No. 32AzureOct 28, 20257 min read

Bicep Deployment Stacks: The Cleanup Story I Should Have Shipped Years Ago

I've been deploying Bicep against resource groups for four years. I have, on three separate occasions, deployed a fresh template and then forgotten about the resources the *previous* template created, because Azure's default deplo…

I've been deploying Bicep against resource groups for four years. I have, on three separate occasions, deployed a fresh template and then forgotten about the resources the previous template created, because Azure's default deployment mode is incremental and never deletes anything. Two of those times resulted in a $200/day mystery cost that took weeks to track down.

Deployment stacks fix that. They turn a Bicep deploy into something closer to terraform apply, managed lifecycle, deletion semantics, deny-assignment protection. We migrated our top three landing-zone modules in a sprint. This is the migration and the four things to know.

What changes

A normal Bicep deploy creates a Microsoft.Resources/deployments record. It tells Azure which template ran, with which parameters, but nothing about ownership. Resources outlive the template that made them.

A deployment stack creates a Microsoft.Resources/deploymentStacks record that explicitly owns the resources it deploys. Re-deploying the stack with a smaller template removes the resources that disappeared from the template. Optionally, the stack also creates a deny-assignment that prevents anyone from modifying its resources outside of the stack.

The CLI

# Create or update a stack
az stack group create \
  --name landing-zone-network \
  --resource-group rg-platform-prod \
  --template-file infra/network.bicep \
  --action-on-unmanage deleteResources \
  --deny-settings-mode denyDelete \
  --yes

Three flags do the work:

  • --action-on-unmanage, what happens to resources removed from the template. deleteResources is the strongest, detachAll keeps them but un-tracks them. I default to deleteResources for greenfield, detachAll for migrations.
  • --deny-settings-mode, none, denyDelete, or denyWriteAndDelete. The deny-assignment applies to everyone except the principal that created the stack. Use denyDelete everywhere; denyWriteAndDelete only for landing-zone foundations you actively don't want anyone to touch.
  • --yes, confirm changes non-interactively (for pipelines).

The pipeline

# Azure DevOps pipeline excerpt
- task: AzureCLI@2
  displayName: Deploy network stack
  inputs:
    azureSubscription: $(svc)
    scriptType: bash
    scriptLocation: inlineScript
    inlineScript: |
      az stack group create \
        --name landing-zone-network \
        --resource-group $(rg) \
        --template-file infra/network.bicep \
        --parameters infra/network.bicepparam \
        --action-on-unmanage deleteResources \
        --deny-settings-mode denyDelete \
        --deny-settings-excluded-actions \
            "Microsoft.Network/virtualNetworks/subnets/join/action" \
            "Microsoft.Network/networkSecurityGroups/securityRules/*" \
        --yes

--deny-settings-excluded-actions is the practical knob, it lets specific actions through despite the deny-assignment. Subnet-join is the classic example: a workload deployed by a different stack still needs to attach to a subnet from the network stack.

Migrating an existing resource group

The two-phase migration that doesn't lose state:

# Phase 1: create the stack with the existing template, action-on-unmanage=detachAll
az stack group create --name lz-network \
  --resource-group rg-platform-prod \
  --template-file infra/network.bicep \
  --action-on-unmanage detachAll \
  --deny-settings-mode none \
  --yes

# Verify with az stack group show — resources should be listed under "managedResources"
az stack group show --name lz-network --resource-group rg-platform-prod \
  --query "managedResources[].id" -o tsv

# Phase 2: tighten settings on the next deploy
az stack group create --name lz-network \
  --resource-group rg-platform-prod \
  --template-file infra/network.bicep \
  --action-on-unmanage deleteResources \
  --deny-settings-mode denyDelete \
  --yes

The phase-1 detachAll is non-destructive, if the template is wrong, no resources are deleted. Use it to verify the resource list, then graduate to deleteResources. Six landing zones migrated this way, zero resource loss.

What broke first

Cross-stack references aren't a thing. If network.bicep produces a vnet ID that aks.bicep consumes, you can't reference the network stack's outputs from the AKS stack the way Terraform's data blocks let you. You either pass IDs in via parameters (and risk drift) or use a single stack that deploys both. We picked single-stack-per-landing-zone, three stacks total: network, identity, observability. Workloads use a different mechanism (Azure Verified Modules) entirely.

detachAll after a typo'd template quietly un-tracked half a resource group. The stack accepted a template that had been refactored without one resource and detached everything missing. There was no destructive action, but the deny-assignment was now gone from those resources. Lesson: phase 1's detachAll belongs in a separate first deploy, not in routine ones. Once a stack is established, deleteResources is safer because it makes the diff visible in CI logs.

Deny-assignment fights with subscription-level role assignments. If your deny-assignment denies Microsoft.Network/*/write, a subscription Owner can still write, but a contributor on the resource group can't. The order is intuitive once you know it (deny > role) but produces "I'm Owner of this resource group, why can't I edit this NSG?" support tickets. Document it on the stack's README; link from the deny-assignment description.

Stack name collisions across resource groups. Stack names are scoped to the resource group, so two resource groups can have a stack called network. Mostly fine, but az stack list without --resource-group is paginated and slow at scale. We adopted a single naming convention: <lz-name>-<role> (e.g. prod-network, prod-identity).

Numbers, after the migration

  • 3 stacks live (network, identity, observability)
  • Zero orphaned resources detected by Cost Management's "untracked resources" report after one quarter, vs ~12 in the equivalent quarter the year before
  • One "wait the deploy deleted that?" moment, caught at PR review by the diff in az stack group show --diff output

What I'd do differently

I'd deploy stacks at subscription scope, not resource-group scope, for the network landing zone. RG-scoped stacks can't manage VNet peerings to other subscriptions cleanly; sub-scoped stacks can. The downside is the deny-assignment is broader, so plan the --deny-settings-excluded-actions list more carefully.

I would NOT use deployment stacks for application workloads. They're the right tool for landing-zone-grade infrastructure where ownership and lifecycle matter. They're overkill for a Functions app that gets re-deployed twice a day, that workload is owned by the pipeline, not the stack, and the deny-assignment friction isn't worth the cleanup story.

BicepDeployment StacksLanding Zones

Conversation

Reactions & comments

Liked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.

Loading conversation…

More from Azure

See all →