No. 42AzureNov 4, 202526 min read

Migrate a Resource Group Into a Bicep Deployment Stack: Two-Phase, Zero-Downtime

I have, over the past four years of Azure work, deployed a Bicep template against a resource group three separate times and forgotten about resources the *previous* template had created.

By Damilola Onadeinde

Senior DevOps Engineer

Share on X LinkedIn

I have, over the past four years of Azure work, deployed a Bicep template against a resource group three separate times and forgotten about resources the previous template had created. Twice that produced a $200/day mystery cost that took weeks to track down. Once it was a security finding, because the orphaned resource was a network interface attached to a VM that had been deleted but whose NIC kept a public IP for nine months.

This is the failure mode that deployment stacks were built to fix. A normal Bicep deploy creates a Microsoft.Resources/deployments record that says "this template ran with these parameters". It says nothing about ownership. Resources outlive the template that made them. A stack creates a Microsoft.Resources/deploymentStacks record that explicitly owns the resources it deploys, deletes them when they leave the template, and optionally creates a deny-assignment that prevents anyone from modifying them outside the stack.

The migration is delicate the first time because production state is real and you can't afford to delete the wrong thing. This post is the two-phase pattern that has worked for me on six landing zones with zero resource loss. By the end you have an existing resource group migrated into a deployment stack, a deny-assignment hardening it against accidental modifications, a pipeline that uses the stack going forward, and a drift-detection workflow that fires when something changes outside the repo.

What you'll have at the end

~/landing-zone-migrate/
├── infra/
│   ├── main.bicep                    # the existing template, unchanged
│   └── main.bicepparam
├── migrations/
│   ├── phase-1-detach.sh             # non-destructive
│   ├── phase-2-deletions.sh          # the destructive flip
│   └── verify-managed.sh
├── pipelines/
│   └── azure-pipelines.yml
└── README.md

Why two phases, and not one

Brief explanation because the two-phase pattern is what makes this safe and most tutorials skip it.

The naive migration is one step: az stack group create --action-on-unmanage deleteResources. If your template matches reality exactly, this works fine. If it doesn't, this deletes resources you didn't mean to delete.

Templates rarely match reality exactly. Someone added a tag in the portal. Someone created a diagnostic settings resource by hand. Someone resized a VM with a CLI command. Each of these creates drift between the template and the live state. A single-phase migration with deleteResources will delete the live resources that don't match.

The two-phase pattern catches this. Phase 1 is detachAll, which is non-destructive: the stack starts owning the resources but never deletes them. You verify the stack's view of the world matches the resource group's. Phase 2 is deleteResources, which is destructive but only after you've proven the inventory is right.

It's a small extra step that catches the only class of bug that actually matters in this migration. Skip it once and you'll learn why the pattern exists.

Prerequisites

Before you start, you need:

An existing resource group with resources you want to put into a stack
The Bicep template that originally produced those resources (or one that produces them now)
Owner on the resource group. Deny-assignments require Owner; Contributor isn't enough.

az --version            # 2.65+
az bicep version        # 0.30+

az login
RG="rg-platform-prod"   # the existing RG you're migrating
TEMPLATE="infra/main.bicep"
PARAMS="infra/main.bicepparam"
SUB=$(az account show --query id -o tsv)

A note on the Owner requirement: the deny-assignment that the stack creates is the security-critical part of the migration. Only Owner can create deny-assignments. If your platform-pipeline service principal has Contributor, you'll need to either upgrade it (carefully, with documentation) or run the migration manually as Owner one time, then let the pipeline manage the stack going forward with Contributor (which is enough for updates, just not for deny-assignment creation).

Step 1: Inventory what's in the resource group

az resource list -g $RG --query '[].{type:type, name:name, location:location}' -o table > inventory-before.txt
wc -l inventory-before.txt

Save this file. It's your reference for everything that follows. If anything's missing post-migration, this is the proof you started with it.

The inventory file is the single most-overlooked artefact of this migration. Most teams do the migration, see "stack created successfully", and move on. Three months later something's missing and there's no record of what was there. Save the inventory; commit it to the repo as migrations/inventory-2026-05-04.txt or similar; never delete it.

Step 2: Run what-if to confirm the template is current

az deployment group what-if \
  --resource-group $RG \
  --template-file $TEMPLATE \
  --parameters $PARAMS

The output should be all NoChange. If you see Modify or Delete entries, your template has drifted from production state. You need to reconcile before you start the stack migration. Stacks aren't a fix for template drift; they're a tool that assumes your template matches reality.

If there's drift, you have two options:

Fix the template. Look at what what-if says is changing, update the Bicep to match the live state, re-run, repeat until clean. This is the right answer 90% of the time.
Run a regular az deployment group create to bring production back to the template. This is the right answer only if you're sure the live state is the wrong one.

Don't proceed to phase 1 until what-if is clean. The stack migration is going to do exactly what your template says; if your template is wrong, the stack will faithfully execute the wrong thing.

Step 3: Phase 1: Create the stack with `detachAll`

migrations/phase-1-detach.sh:

#!/usr/bin/env bash
set -euo pipefail

RG="${RG:?missing RG}"
STACK_NAME="${STACK_NAME:-${RG}-stack}"
TEMPLATE="${TEMPLATE:?missing TEMPLATE}"
PARAMS="${PARAMS:?missing PARAMS}"

echo "Phase 1: creating stack '$STACK_NAME' on '$RG' with detachAll"
echo "This is non-destructive, no resources will be deleted."
echo

az stack group create \
  --name "$STACK_NAME" \
  --resource-group "$RG" \
  --template-file "$TEMPLATE" \
  --parameters "$PARAMS" \
  --action-on-unmanage detachAll \
  --deny-settings-mode none \
  --yes

echo
echo "Phase 1 complete. Verify with:"
echo "  bash migrations/verify-managed.sh"

RG=rg-platform-prod TEMPLATE=infra/main.bicep PARAMS=infra/main.bicepparam \
  bash migrations/phase-1-detach.sh

This creates a Microsoft.Resources/deploymentStacks record that owns the resources the template produces, but with detachAll set, removing a resource from the template later only un-tracks it, never deletes it. There's no destructive action available in phase 1.

The --deny-settings-mode none is intentional. We don't want the deny-assignment yet because we haven't verified the stack is right. Adding the deny-assignment first means an audit trail of "the stack thought it owned X, deny-assigned X, then we found out X was actually two resources" which is more confusing than it sounds.

Step 4: Verify the stack manages everything in the RG

migrations/verify-managed.sh:

#!/usr/bin/env bash
set -euo pipefail

RG="${RG:?missing RG}"
STACK_NAME="${STACK_NAME:-${RG}-stack}"

echo "=== Stack-managed resources ==="
az stack group show \
  --name "$STACK_NAME" \
  --resource-group "$RG" \
  --query "resources[].id" -o tsv | sort > /tmp/stack-managed.txt
wc -l /tmp/stack-managed.txt

echo
echo "=== Resources in RG ==="
az resource list -g "$RG" --query '[].id' -o tsv | sort > /tmp/rg-resources.txt
wc -l /tmp/rg-resources.txt

echo
echo "=== In RG but NOT managed by stack ==="
comm -23 /tmp/rg-resources.txt /tmp/stack-managed.txt

echo
echo "=== Managed by stack but NOT in RG ==="
comm -13 /tmp/rg-resources.txt /tmp/stack-managed.txt

RG=rg-platform-prod bash migrations/verify-managed.sh

You want both diff sections to be empty.

"In RG but NOT managed by stack" lists resources that exist in the RG but the template doesn't produce. These are unmanaged resources. You'll need to either add them to the template, move them out of the RG, or accept that they won't be tracked by the stack. Don't proceed to phase 2 until you've decided what to do about each one.

"Managed by stack but NOT in RG" lists resources the template claims exist but actually don't. This should be empty if the template was applied; if it isn't, your template references resources that were never created (a logic bug worth investigating).

If either list is non-empty, do not proceed to phase 2. Fix the discrepancy first. Phase 2 will faithfully delete from RG anything not in the template; "I forgot we manually deleted that VM in March" is the kind of thing that produces real damage if you skip the verify step.

Step 5: Phase 2: Promote to `deleteResources` and deny-settings

Once phase 1 is verified, the deny-settings story can be planned. The deny-assignment fights with subscription-level role assignments: a subscription Owner can still write, but a contributor on the resource group can't.

Decide what to exclude from the deny-assignment based on real workload needs. Common exclusions:

Microsoft.Network/virtualNetworks/subnets/join/action if other workloads attach to subnets in this RG
Microsoft.Network/networkSecurityGroups/securityRules/* if app teams own their NSG rules
Microsoft.Insights/diagnosticSettings/* if a separate identity provisions diagnostics

migrations/phase-2-deletions.sh:

#!/usr/bin/env bash
set -euo pipefail

RG="${RG:?missing RG}"
STACK_NAME="${STACK_NAME:-${RG}-stack}"
TEMPLATE="${TEMPLATE:?missing TEMPLATE}"
PARAMS="${PARAMS:?missing PARAMS}"
DENY_MODE="${DENY_MODE:-denyDelete}"

echo "Phase 2: tightening stack '$STACK_NAME' to deleteResources + $DENY_MODE"
echo
read -rp "Have you run phase 1 + verify-managed? (y/N) " ack
[ "$ack" = "y" ] || { echo "aborted"; exit 1; }

az stack group create \
  --name "$STACK_NAME" \
  --resource-group "$RG" \
  --template-file "$TEMPLATE" \
  --parameters "$PARAMS" \
  --action-on-unmanage deleteResources \
  --deny-settings-mode "$DENY_MODE" \
  --deny-settings-excluded-actions \
      "Microsoft.Network/virtualNetworks/subnets/join/action" \
      "Microsoft.Network/networkSecurityGroups/securityRules/*" \
      "Microsoft.Insights/diagnosticSettings/*" \
  --yes \
  --bypass-stack-out-of-sync-error

The interactive read prompt is deliberate. Phase 1 is non-destructive; phase 2 is not. The two-pass review eliminates "I forgot to verify" as a failure mode. If you find yourself wanting to remove the prompt for "convenience", reconsider whether you should be running this without supervision.

RG=rg-platform-prod TEMPLATE=infra/main.bicep PARAMS=infra/main.bicepparam \
  bash migrations/phase-2-deletions.sh

After completion:

az stack group show -g $RG -n "${RG}-stack" \
  --query "{state:provisioningState, action:actionOnUnmanage, deny:denySettings.mode}" -o table

You should see provisioningState: Succeeded, actionOnUnmanage: deleteResources, deny: denyDelete.

Step 6: Confirm the deny-assignment is in place

az deny-assignment list \
  --scope "/subscriptions/$SUB/resourceGroups/$RG" \
  --query '[?denyAssignmentName!=`null`].{name:denyAssignmentName, scope:scope, principals:principals[].displayName}' \
  -o table

You'll see a deny-assignment with a name like denyAssignment-${stackId}, with everyone in the principals list and the stack's identity in the excludePrincipals list. That's how it works: deny applies to everyone, the stack itself is exempt.

If you (a real human admin) try to delete a stack-managed resource:

az resource delete --ids "<some resource id from the stack>" 2>&1 | head -3

You'll get a 403 RequestDisallowedByAzure with the deny-assignment's display name in the error. That's success, the safety net works. Document the error so your team knows what it means; the first time someone hits this in production, "RequestDisallowedByAzure" is opaque enough that they'll file a support ticket against Azure rather than recognise the deny-assignment.

Step 7: Update the deploy pipeline to use the stack

Before the migration, your deploy step probably looked like:

- task: AzureCLI@2
  displayName: Deploy
  inputs:
    azureSubscription: $(svc)
    scriptType: bash
    scriptLocation: inlineScript
    inlineScript: |
      az deployment group create \
        --resource-group $(rg) \
        --template-file infra/main.bicep \
        --parameters infra/main.bicepparam

After the migration, swap it for the stack equivalent:

- task: AzureCLI@2
  displayName: Deploy stack
  inputs:
    azureSubscription: $(svc)
    scriptType: bash
    scriptLocation: inlineScript
    inlineScript: |
      az stack group create \
        --name "$(rg)-stack" \
        --resource-group $(rg) \
        --template-file infra/main.bicep \
        --parameters infra/main.bicepparam \
        --action-on-unmanage deleteResources \
        --deny-settings-mode denyDelete \
        --deny-settings-excluded-actions \
            "Microsoft.Network/virtualNetworks/subnets/join/action" \
        --yes

The stack name is fixed (<rg>-stack is a clean convention). The --yes is required for non-interactive runs. Everything else is the same Bicep you already have.

A subtle point: every pipeline run after the migration is using the stack. If someone runs az deployment group create against the same RG (bypassing the stack), the stack's view goes out of sync. The fix is to lock down Microsoft.Resources/deployments/write at the RG scope, but in practice the easier discipline is just "the stack is the source of truth, all changes go through it". Document this prominently and the team adopts it.

Step 8: Drift-detection workflow

pipelines/drift-detection.yml:

ki-light:#24292e;--shiki-dark:#adbac7;--shiki-light-bg:#fff;--shiki-dark-bg:#22272e" tabindex="0">

trigger: none schedules: - cron: '0 7 * * *' branches: { include: [ main ] } always: true variables: - name: stackName value: 'rg-platform-prod-stack' stages: - stage: Drift jobs: - job: Check steps: - task: AzureCLI@2 displayName: Check stack vs template inputs: azureSubscription: $(svc) scriptType: bash scriptLocation: inlineScript inlineScript: | set -e az deployment group what-if \ --resource-group rg-platform-prod \ --template-file infra/main.bicep \ --parameters infra/main.bicepparam \ --no-pretty-print --result-format ResourceIdOnly \ > whatif.json jq '.changes | group_by(.changeType) | map({type: .[0].changeType, n: length})' whatif.json MODIFIED=$(jq '[.changes[] | select(.changeType == "Modify")] | length' whatif.json) DELETED=$(jq '[.changes[] | select(.changeType == "Delete")] | length' whatif.json) if [ "$MODIFIED" -gt 0 ] || [ "$DELETED" -gt 0 ]; then echo "##vso[task.logissue type=error]Drift detected: $MODIFIED modified, $DELETED deleted" echo "##vso[task.complete result=Failed;]" exit 1 fi
This runs daily at 07:17 UTC and fails if the live state has drifted from the template. Drift on a stack-managed RG is unusual because the deny-assignment blocks most accidental changes, but it can still happen if someone with Owner privileges modifies a resource directly. The drift detector catches it the next morning.
Step 9: Onboarding additional resource groups
Once the first migration works, the same scripts handle the rest:
for RG in rg-platform-identity rg-platform-monitor rg-platform-network; do
  echo "=== Migrating $RG ==="
  TEMPLATE_PATH="infra/$RG.bicep"
  PARAMS_PATH="infra/$RG.bicepparam"

  RG=$RG TEMPLATE=$TEMPLATE_PATH PARAMS=$PARAMS_PATH bash migrations/phase-1-detach.sh
  RG=$RG bash migrations/verify-managed.sh
  read -rp "Verify clean for $RG? Continue to phase 2? (y/N) " ack
  [ "$ack" = "y" ] || continue
  RG=$RG TEMPLATE=$TEMPLATE_PATH PARAMS=$PARAMS_PATH bash migrations/phase-2-deletions.sh
done

The interactive prompt between phase 1 and phase 2 stays in. Even on the fifth or sixth migration, the prompt has paid for itself by this point.
Step 10: Removing or detaching a stack
If you ever need to detach a stack (keep resources, drop the stack ownership):
az stack group delete \
  --name "${RG}-stack" \
  --resource-group $RG \
  --action-on-unmanage detachAll \
  --yes

Or to fully delete the stack and all managed resources:
az stack group delete \
  --name "${RG}-stack" \
  --resource-group $RG \
  --action-on-unmanage deleteAll \
  --yes

deleteAll is destructive. Never run this from a pipeline. It's a manual operation only, and only after a tagged confirmation. Lock it behind RBAC if you can; the audit trail you want is "this exact human, on this exact date, intentionally invoked deleteAll".
Production checklist

Phase 1 first, always. Non-destructive, costs nothing, reveals every difference between template and reality.
Document the deny-settings exclusions. Every entry in deny-settings-excluded-actions should have a comment explaining why; otherwise people copy-paste them forever.
Watch the Bypassed-stack-out-of-sync warnings. They indicate someone made an out-of-band change between phase 1 and phase 2; investigate before bypassing.
Stack names are scoped to the RG. <rg>-stack is unique enough; cross-RG name collisions are allowed but make az stack list paginate.
Pipeline service principal needs Microsoft.Resources/deploymentStacks/*. If your pipeline-SP only has Contributor, deployment stacks won't work. Assign Owner on the RG, or a custom role with the stacks actions.

Troubleshooting
AuthorizationFailed: The client does not have authorization to perform action 'Microsoft.Authorization/denyAssignments/write', Pipeline SP is Contributor, not Owner. Deny-assignments require Owner. Either upgrade the SP or use --deny-settings-mode none and accept that the deny-assignment doesn't get created.
Stack out of sync, Someone created a resource in the RG that the stack doesn't know about, or the live state of a tracked resource was modified outside the stack. Run az stack group show and compare to the RG resource list. If intentional, use --bypass-stack-out-of-sync-error to override; otherwise fix the drift.
Phase 1 deployed but verify-managed.sh shows resources NOT managed by the stack, Your template is not producing all the resources actually in the RG. Add them to the template, run phase 1 again, and re-verify.
Operation failed because of policy, A subscription-level Azure Policy is blocking a property the stack is trying to set. Check the deployment failure details (az deployment operation group list) for the specific policy name; either fix the template or get an exemption.
Insufficient privileges to delete resource from a non-stack identity, The deny-assignment is doing its job. Either route the operation through the stack, or add the action to --deny-settings-excluded-actions.
What this gives you, and the part that's hard to measure
The obvious win is the inventory hygiene. Resources can no longer drift in or out of the stack without you noticing. The verify-managed.sh script catches the case where a resource exists but isn't tracked; the deny-assignment catches the case where someone tries to modify a tracked resource outside the stack; the drift workflow catches both at the next sunrise.
The less-obvious win is what changes about the relationship between Bicep and production. Before stacks, "the Bicep template" and "what's actually in production" were two different things, related but not identical. After stacks, they're the same thing, with a deny-assignment enforcing the relationship. You can read the Bicep and know what's in the resource group, instead of having to verify with a separate az resource list.
This sounds small. It compounds. Every conversation about a resource group used to require someone to confirm "is this still what's in there?". Now the answer is "yes, by definition". The deny-assignment makes the Bicep authoritative; the stack makes "delete what's missing" a normal operation; the drift workflow makes any deviation visible the next day.
For a team that ships infrastructure changes monthly, the cumulative time saved on "is the template still accurate" conversations is genuinely valuable. For a team that's recently been bitten by an orphaned-resource cost incident, it's the difference between "we'll be more careful" and "the system is more careful so we don't have to be".
The migration is one day of work for the first stack. Each subsequent stack is one or two hours, and the pattern is reproducible. Six months in, our org had migrated thirty-something landing-zone resource groups. The orphaned-resource cost reports went from a quarterly thing FinOps complained about to a monthly null. That's the deliverable that justifies the work, more than any individual stack's deny-assignment.

BicepDeployment StacksMigration

`Conversation`

Reactions & comments

Liked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.

Loading conversation…

 Older
Time-Slicing vs MIG for Bursty LLM Inference Traffic on AKS GPU Node PoolsNewer 
Per-Tenant Cost Attribution for Azure OpenAI Traffic Using APIM emit-token-metric-policy

`More from Azure`

See all →


AzureDec 5, 2025
Air-Gapped Azure OpenAI With Private Endpoints: A Terraform Module That Actually Works
"Air-gapped" is a strong word for something running in a public cloud, but it's the right word for what regulated customers want: an Azure OpenAI deployment whose only network path is through their own VNet, with public access ful…
10 min read


AzureNov 24, 2025
Azure Policy as Code in Pipelines: Testing, Drift Detection, and Why Audit-Mode Isn't Free
We have 47 Azure Policy assignments across the platform. They were managed by hand for two years, a screen of click-through configuration that only the platform lead understood, mostly auditmode, with one Deny assignment that nobo…
8 min read


AzureNov 17, 2025
Build the Azure Policy as Code Pipeline: Definitions, Tests, Drift, Exemptions
Two years ago we had 47 Azure Policy assignments across our subscriptions. They were managed by hand, click-through configuration in the portal, mostly audit-mode, with one Deny assignment that nobody trusted enough to actually en…
28 min read

What you'll have at the end

Why two phases, and not one

Prerequisites

Step 1: Inventory what's in the resource group

Step 2: Run what-if to confirm the template is current

Step 3: Phase 1: Create the stack with detachAll

Step 4: Verify the stack manages everything in the RG

Step 5: Phase 2: Promote to deleteResources and deny-settings

Step 6: Confirm the deny-assignment is in place

Step 7: Update the deploy pipeline to use the stack

Step 8: Drift-detection workflow

Step 9: Onboarding additional resource groups

Step 10: Removing or detaching a stack

Production checklist

Troubleshooting

What this gives you, and the part that's hard to measure

Conversation

More from Azure

Air-Gapped Azure OpenAI With Private Endpoints: A Terraform Module That Actually Works

Azure Policy as Code in Pipelines: Testing, Drift Detection, and Why Audit-Mode Isn't Free

Build the Azure Policy as Code Pipeline: Definitions, Tests, Drift, Exemptions

Step 3: Phase 1: Create the stack with `detachAll`

Step 5: Phase 2: Promote to `deleteResources` and deny-settings

`Conversation`

`More from Azure`