I have, over the past four years of Azure work, deployed a Bicep template against a resource group three separate times and forgotten about resources the previous template had created. Twice that produced a $200/day mystery cost that took weeks to track down. Once it was a security finding, because the orphaned resource was a network interface attached to a VM that had been deleted but whose NIC kept a public IP for nine months.
This is the failure mode that deployment stacks were built to fix. A normal Bicep deploy creates a Microsoft.Resources/deployments record that says "this template ran with these parameters". It says nothing about ownership. Resources outlive the template that made them. A stack creates a Microsoft.Resources/deploymentStacks record that explicitly owns the resources it deploys, deletes them when they leave the template, and optionally creates a deny-assignment that prevents anyone from modifying them outside the stack.
The migration is delicate the first time because production state is real and you can't afford to delete the wrong thing. This post is the two-phase pattern that has worked for me on six landing zones with zero resource loss. By the end you have an existing resource group migrated into a deployment stack, a deny-assignment hardening it against accidental modifications, a pipeline that uses the stack going forward, and a drift-detection workflow that fires when something changes outside the repo.
What you'll have at the end
~/landing-zone-migrate/
├── infra/
│ ├── main.bicep # the existing template, unchanged
│ └── main.bicepparam
├── migrations/
│ ├── phase-1-detach.sh # non-destructive
│ ├── phase-2-deletions.sh # the destructive flip
│ └── verify-managed.sh
├── pipelines/
│ └── azure-pipelines.yml
└── README.md
Why two phases, and not one
Brief explanation because the two-phase pattern is what makes this safe and most tutorials skip it.
The naive migration is one step: az stack group create --action-on-unmanage deleteResources. If your template matches reality exactly, this works fine. If it doesn't, this deletes resources you didn't mean to delete.
Templates rarely match reality exactly. Someone added a tag in the portal. Someone created a diagnostic settings resource by hand. Someone resized a VM with a CLI command. Each of these creates drift between the template and the live state. A single-phase migration with deleteResources will delete the live resources that don't match.
The two-phase pattern catches this. Phase 1 is detachAll, which is non-destructive: the stack starts owning the resources but never deletes them. You verify the stack's view of the world matches the resource group's. Phase 2 is deleteResources, which is destructive but only after you've proven the inventory is right.
It's a small extra step that catches the only class of bug that actually matters in this migration. Skip it once and you'll learn why the pattern exists.
Prerequisites
Before you start, you need:
- An existing resource group with resources you want to put into a stack
- The Bicep template that originally produced those resources (or one that produces them now)
Owneron the resource group. Deny-assignments require Owner;Contributorisn't enough.
az --version # 2.65+
az bicep version # 0.30+
az login
RG="rg-platform-prod" # the existing RG you're migrating
TEMPLATE="infra/main.bicep"
PARAMS="infra/main.bicepparam"
SUB=$(az account show --query id -o tsv)
A note on the Owner requirement: the deny-assignment that the stack creates is the security-critical part of the migration. Only Owner can create deny-assignments. If your platform-pipeline service principal has Contributor, you'll need to either upgrade it (carefully, with documentation) or run the migration manually as Owner one time, then let the pipeline manage the stack going forward with Contributor (which is enough for updates, just not for deny-assignment creation).
Step 1: Inventory what's in the resource group
az resource list -g $RG --query '[].{type:type, name:name, location:location}' -o table > inventory-before.txt
wc -l inventory-before.txt
Save this file. It's your reference for everything that follows. If anything's missing post-migration, this is the proof you started with it.
The inventory file is the single most-overlooked artefact of this migration. Most teams do the migration, see "stack created successfully", and move on. Three months later something's missing and there's no record of what was there. Save the inventory; commit it to the repo as migrations/inventory-2026-05-04.txt or similar; never delete it.
Step 2: Run what-if to confirm the template is current
az deployment group what-if \
--resource-group $RG \
--template-file $TEMPLATE \
--parameters $PARAMS
The output should be all NoChange. If you see Modify or Delete entries, your template has drifted from production state. You need to reconcile before you start the stack migration. Stacks aren't a fix for template drift; they're a tool that assumes your template matches reality.
If there's drift, you have two options:
- Fix the template. Look at what what-if says is changing, update the Bicep to match the live state, re-run, repeat until clean. This is the right answer 90% of the time.
- Run a regular
az deployment group createto bring production back to the template. This is the right answer only if you're sure the live state is the wrong one.
Don't proceed to phase 1 until what-if is clean. The stack migration is going to do exactly what your template says; if your template is wrong, the stack will faithfully execute the wrong thing.
Step 3: Phase 1: Create the stack with detachAll
migrations/phase-1-detach.sh:
#!/usr/bin/env bash
set -euo pipefail
RG="${RG:?missing RG}"
STACK_NAME="${STACK_NAME:-${RG}-stack}"
TEMPLATE="${TEMPLATE:?missing TEMPLATE}"
PARAMS="${PARAMS:?missing PARAMS}"
echo "Phase 1: creating stack '$STACK_NAME' on '$RG' with detachAll"
echo "This is non-destructive, no resources will be deleted."
echo
az stack group create \
--name "$STACK_NAME" \
--resource-group "$RG" \
--template-file "$TEMPLATE" \
--parameters "$PARAMS" \
--action-on-unmanage detachAll \
--deny-settings-mode none \
--yes
echo
echo "Phase 1 complete. Verify with:"
echo " bash migrations/verify-managed.sh"
RG=rg-platform-prod TEMPLATE=infra/main.bicep PARAMS=infra/main.bicepparam \
bash migrations/phase-1-detach.sh
This creates a Microsoft.Resources/deploymentStacks record that owns the resources the template produces, but with detachAll set, removing a resource from the template later only un-tracks it, never deletes it. There's no destructive action available in phase 1.
The --deny-settings-mode none is intentional. We don't want the deny-assignment yet because we haven't verified the stack is right. Adding the deny-assignment first means an audit trail of "the stack thought it owned X, deny-assigned X, then we found out X was actually two resources" which is more confusing than it sounds.
Step 4: Verify the stack manages everything in the RG
migrations/verify-managed.sh:
#!/usr/bin/env bash
set -euo pipefail
RG="${RG:?missing RG}"
STACK_NAME="${STACK_NAME:-${RG}-stack}"
echo "=== Stack-managed resources ==="
az stack group show \
--name "$STACK_NAME" \
--resource-group "$RG" \
--query "resources[].id" -o tsv | sort > /tmp/stack-managed.txt
wc -l /tmp/stack-managed.txt
echo
echo "=== Resources in RG ==="
az resource list -g "$RG" --query '[].id' -o tsv | sort > /tmp/rg-resources.txt
wc -l /tmp/rg-resources.txt
echo
echo "=== In RG but NOT managed by stack ==="
comm -23 /tmp/rg-resources.txt /tmp/stack-managed.txt
echo
echo "=== Managed by stack but NOT in RG ==="
comm -13 /tmp/rg-resources.txt /tmp/stack-managed.txt
RG=rg-platform-prod bash migrations/verify-managed.sh
You want both diff sections to be empty.
"In RG but NOT managed by stack" lists resources that exist in the RG but the template doesn't produce. These are unmanaged resources. You'll need to either add them to the template, move them out of the RG, or accept that they won't be tracked by the stack. Don't proceed to phase 2 until you've decided what to do about each one.
"Managed by stack but NOT in RG" lists resources the template claims exist but actually don't. This should be empty if the template was applied; if it isn't, your template references resources that were never created (a logic bug worth investigating).
If either list is non-empty, do not proceed to phase 2. Fix the discrepancy first. Phase 2 will faithfully delete from RG anything not in the template; "I forgot we manually deleted that VM in March" is the kind of thing that produces real damage if you skip the verify step.
Step 5: Phase 2: Promote to deleteResources and deny-settings
Once phase 1 is verified, the deny-settings story can be planned. The deny-assignment fights with subscription-level role assignments: a subscription Owner can still write, but a contributor on the resource group can't.
Decide what to exclude from the deny-assignment based on real workload needs. Common exclusions:
Microsoft.Network/virtualNetworks/subnets/join/actionif other workloads attach to subnets in this RGMicrosoft.Network/networkSecurityGroups/securityRules/*if app teams own their NSG rulesMicrosoft.Insights/diagnosticSettings/*if a separate identity provisions diagnostics
migrations/phase-2-deletions.sh:
#!/usr/bin/env bash
set -euo pipefail
RG="${RG:?missing RG}"
STACK_NAME="${STACK_NAME:-${RG}-stack}"
TEMPLATE="${TEMPLATE:?missing TEMPLATE}"
PARAMS="${PARAMS:?missing PARAMS}"
DENY_MODE="${DENY_MODE:-denyDelete}"
echo "Phase 2: tightening stack '$STACK_NAME' to deleteResources + $DENY_MODE"
echo
read -rp "Have you run phase 1 + verify-managed? (y/N) " ack
[ "$ack" = "y" ] || { echo "aborted"; exit 1; }
az stack group create \
--name "$STACK_NAME" \
--resource-group "$RG" \
--template-file "$TEMPLATE" \
--parameters "$PARAMS" \
--action-on-unmanage deleteResources \
--deny-settings-mode "$DENY_MODE" \
--deny-settings-excluded-actions \
"Microsoft.Network/virtualNetworks/subnets/join/action" \
"Microsoft.Network/networkSecurityGroups/securityRules/*" \
"Microsoft.Insights/diagnosticSettings/*" \
--yes \
--bypass-stack-out-of-sync-error
The interactive read prompt is deliberate. Phase 1 is non-destructive; phase 2 is not. The two-pass review eliminates "I forgot to verify" as a failure mode. If you find yourself wanting to remove the prompt for "convenience", reconsider whether you should be running this without supervision.
RG=rg-platform-prod TEMPLATE=infra/main.bicep PARAMS=infra/main.bicepparam \
bash migrations/phase-2-deletions.sh
After completion:
az stack group show -g $RG -n "${RG}-stack" \
--query "{state:provisioningState, action:actionOnUnmanage, deny:denySettings.mode}" -o table
You should see provisioningState: Succeeded, actionOnUnmanage: deleteResources, deny: denyDelete.
Step 6: Confirm the deny-assignment is in place
az deny-assignment list \
--scope "/subscriptions/$SUB/resourceGroups/$RG" \
--query '[?denyAssignmentName!=`null`].{name:denyAssignmentName, scope:scope, principals:principals[].displayName}' \
-o table
You'll see a deny-assignment with a name like denyAssignment-${stackId}, with everyone in the principals list and the stack's identity in the excludePrincipals list. That's how it works: deny applies to everyone, the stack itself is exempt.
If you (a real human admin) try to delete a stack-managed resource:
az resource delete --ids "<some resource id from the stack>" 2>&1 | head -3
You'll get a 403 RequestDisallowedByAzure with the deny-assignment's display name in the error. That's success, the safety net works. Document the error so your team knows what it means; the first time someone hits this in production, "RequestDisallowedByAzure" is opaque enough that they'll file a support ticket against Azure rather than recognise the deny-assignment.
Step 7: Update the deploy pipeline to use the stack
Before the migration, your deploy step probably looked like:
- task: AzureCLI@2
displayName: Deploy
inputs:
azureSubscription: $(svc)
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
az deployment group create \
--resource-group $(rg) \
--template-file infra/main.bicep \
--parameters infra/main.bicepparam
After the migration, swap it for the stack equivalent:
- task: AzureCLI@2
displayName: Deploy stack
inputs:
azureSubscription: $(svc)
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
az stack group create \
--name "$(rg)-stack" \
--resource-group $(rg) \
--template-file infra/main.bicep \
--parameters infra/main.bicepparam \
--action-on-unmanage deleteResources \
--deny-settings-mode denyDelete \
--deny-settings-excluded-actions \
"Microsoft.Network/virtualNetworks/subnets/join/action" \
--yes
The stack name is fixed (<rg>-stack is a clean convention). The --yes is required for non-interactive runs. Everything else is the same Bicep you already have.
A subtle point: every pipeline run after the migration is using the stack. If someone runs az deployment group create against the same RG (bypassing the stack), the stack's view goes out of sync. The fix is to lock down Microsoft.Resources/deployments/write at the RG scope, but in practice the easier discipline is just "the stack is the source of truth, all changes go through it". Document this prominently and the team adopts it.
Step 8: Drift-detection workflow
pipelines/drift-detection.yml:
trigger: none
schedules:
- cron: '0 7 * * *'
branches: { include: [ main ] }
always: true
variables:
- name: stackName
value: 'rg-platform-prod-stack'
stages:
- stage: Drift
jobs:
- job: Check
steps:
- task: AzureCLI@2
displayName: Check stack vs template
inputs:
azureSubscription: $(svc)
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
set -e
az deployment group what-if \
--resource-group rg-platform-prod \
--template-file infra/main.bicep \
--parameters infra/main.bicepparam \
--no-pretty-print --result-format ResourceIdOnly \
> whatif.json
jq '.changes | group_by(.changeType) | map({type: .[0].changeType, n: length})' whatif.json
MODIFIED=$(jq '[.changes[] | select(.changeType == "Modify")] | length' whatif.json)
DELETED=$(jq '[.changes[] | select(.changeType == "Delete")] | length' whatif.json)
if [ "$MODIFIED" -gt 0 ] || [ "$DELETED" -gt 0 ]; then
echo "##vso[task.logissue type=error]Drift detected: $MODIFIED modified, $DELETED deleted"
echo "##vso[task.complete result=Failed;]"
exit 1
fi
This runs daily at 07:17 UTC and fails if the live state has drifted from the template. Drift on a stack-managed RG is unusual because the deny-assignment blocks most accidental changes, but it can still happen if someone with Owner privileges modifies a resource directly. The drift detector catches it the next morning.
Step 9: Onboarding additional resource groups
Once the first migration works, the same scripts handle the rest:
for RG in rg-platform-identity rg-platform-monitor rg-platform-network; do
echo "=== Migrating $RG ==="
TEMPLATE_PATH="infra/$RG.bicep"
PARAMS_PATH="infra/$RG.bicepparam"
RG=$RG TEMPLATE=$TEMPLATE_PATH PARAMS=$PARAMS_PATH bash migrations/phase-1-detach.sh
RG=$RG bash migrations/verify-managed.sh
read -rp "Verify clean for $RG? Continue to phase 2? (y/N) " ack
[ "$ack" = "y" ] || continue
RG=$RG TEMPLATE=$TEMPLATE_PATH PARAMS=$PARAMS_PATH bash migrations/phase-2-deletions.sh
done
The interactive prompt between phase 1 and phase 2 stays in. Even on the fifth or sixth migration, the prompt has paid for itself by this point.
Step 10: Removing or detaching a stack
If you ever need to detach a stack (keep resources, drop the stack ownership):
az stack group delete \
--name "${RG}-stack" \
--resource-group $RG \
--action-on-unmanage detachAll \
--yes
Or to fully delete the stack and all managed resources:
az stack group delete \
--name "${RG}-stack" \
--resource-group $RG \
--action-on-unmanage deleteAll \
--yes
deleteAll is destructive. Never run this from a pipeline. It's a manual operation only, and only after a tagged confirmation. Lock it behind RBAC if you can; the audit trail you want is "this exact human, on this exact date, intentionally invoked deleteAll".
Production checklist
- Phase 1 first, always. Non-destructive, costs nothing, reveals every difference between template and reality.
- Document the deny-settings exclusions. Every entry in
deny-settings-excluded-actionsshould have a comment explaining why; otherwise people copy-paste them forever. - Watch the
Bypassed-stack-out-of-sync warnings. They indicate someone made an out-of-band change between phase 1 and phase 2; investigate before bypassing. - Stack names are scoped to the RG.
<rg>-stackis unique enough; cross-RG name collisions are allowed but makeaz stack listpaginate. - Pipeline service principal needs
Microsoft.Resources/deploymentStacks/*. If your pipeline-SP only hasContributor, deployment stacks won't work. AssignOwneron the RG, or a custom role with the stacks actions.
Troubleshooting
AuthorizationFailed: The client does not have authorization to perform action 'Microsoft.Authorization/denyAssignments/write', Pipeline SP is Contributor, not Owner. Deny-assignments require Owner. Either upgrade the SP or use --deny-settings-mode none and accept that the deny-assignment doesn't get created.
Stack out of sync, Someone created a resource in the RG that the stack doesn't know about, or the live state of a tracked resource was modified outside the stack. Run az stack group show and compare to the RG resource list. If intentional, use --bypass-stack-out-of-sync-error to override; otherwise fix the drift.
Phase 1 deployed but verify-managed.sh shows resources NOT managed by the stack, Your template is not producing all the resources actually in the RG. Add them to the template, run phase 1 again, and re-verify.
Operation failed because of policy, A subscription-level Azure Policy is blocking a property the stack is trying to set. Check the deployment failure details (az deployment operation group list) for the specific policy name; either fix the template or get an exemption.
Insufficient privileges to delete resource from a non-stack identity, The deny-assignment is doing its job. Either route the operation through the stack, or add the action to --deny-settings-excluded-actions.
What this gives you, and the part that's hard to measure
The obvious win is the inventory hygiene. Resources can no longer drift in or out of the stack without you noticing. The verify-managed.sh script catches the case where a resource exists but isn't tracked; the deny-assignment catches the case where someone tries to modify a tracked resource outside the stack; the drift workflow catches both at the next sunrise.
The less-obvious win is what changes about the relationship between Bicep and production. Before stacks, "the Bicep template" and "what's actually in production" were two different things, related but not identical. After stacks, they're the same thing, with a deny-assignment enforcing the relationship. You can read the Bicep and know what's in the resource group, instead of having to verify with a separate az resource list.
This sounds small. It compounds. Every conversation about a resource group used to require someone to confirm "is this still what's in there?". Now the answer is "yes, by definition". The deny-assignment makes the Bicep authoritative; the stack makes "delete what's missing" a normal operation; the drift workflow makes any deviation visible the next day.
For a team that ships infrastructure changes monthly, the cumulative time saved on "is the template still accurate" conversations is genuinely valuable. For a team that's recently been bitten by an orphaned-resource cost incident, it's the difference between "we'll be more careful" and "the system is more careful so we don't have to be".
The migration is one day of work for the first stack. Each subsequent stack is one or two hours, and the pattern is reproducible. Six months in, our org had migrated thirty-something landing-zone resource groups. The orphaned-resource cost reports went from a quarterly thing FinOps complained about to a monthly null. That's the deliverable that justifies the work, more than any individual stack's deny-assignment.

Conversation
Reactions & commentsLiked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.