Skip to content
damionas
No. 46DevOpsJan 26, 202632 min read

Multi-Stage Azure Pipelines With Bicep What-If Gates and Canary Promotion

The deploy pipeline I inherited at one customer was a single 400-line YAML that ran `az deployment group create` against production every time someone merged to main. There was no preview, no manual gate, no canary.

The deploy pipeline I inherited at one customer was a single 400-line YAML that ran az deployment group create against production every time someone merged to main. There was no preview, no manual gate, no canary. The team had been burned twice in the previous quarter by Bicep changes that looked harmless in code review and turned out to delete a public IP because the template's existing reference had quietly become a new resource. The fix each time was a frantic redeploy, but the customer had still seen 11 minutes of downtime once and 47 minutes the second time.

This post is the rebuild. By the end you have a multi-stage Azure Pipelines deploy that runs the canonical sequence: build, package, what-if against staging, deploy to staging, smoke test, what-if against prod, manual approval, deploy to prod as a 10% canary, automated health check on the canary, promotion to 100%. Bicep what-if is wired in as a hard gate that fails the run on any unexpected delete, the canary uses Azure Front Door for traffic shaping, and every stage has explicit environment-scoped credentials so a compromised stage cannot deploy to a different stage. About 350 lines of YAML, 200 lines of Bicep, and the operational discipline to never bypass the pipeline.

This piece is project-grounded, written from the perspective of a team that has actually shipped this pattern. The numbers and incidents in this post are real; the names are scrubbed.

Why this exact pattern, and not the simpler alternatives

Brief context because the design choices here are load-bearing.

Why multi-stage instead of single-stage. A single-stage pipeline that runs az deployment group create works on day one. By month six you've shipped a Bicep change that did something unexpected, you've debugged in production, and you've started to feel the absence of a preview step. Multi-stage gives you the preview step (the what-if) and the gate (the manual approval) without rewriting your CI from scratch.

Why Bicep what-if as a gate, not as a comment. Many teams render what-if output as a PR comment for human review. That works for catching obvious mistakes; it does not catch the 3am git revert that re-introduces a since-deleted resource. The pipeline gate parses the JSON output of what-if and fails the run if it sees a Delete on any resource not on an explicit allowlist. Humans can override the gate, but the override is a deliberate action, not a default.

Why canary with Azure Front Door, not deployment slots. App Service deployment slots work fine for app-only releases but cannot route a percentage of users to a different version of an underlying Bicep change (e.g., a new database SKU). Front Door's weighted backend pool works for both layers and integrates cleanly with the multi-stage pipeline. If your workload is a Functions app or App Service that never touches infrastructure changes, slots are simpler; for everything else, Front Door wins.

Why explicit environment-scoped credentials. Each environment (staging, production) gets its own Azure DevOps environment with its own service connection, federated against a different Entra ID app, with RBAC scoped to a different resource group. A pipeline stage can only deploy where its environment's credential allows. This is the security model that makes the pipeline auditable; without it, a single compromised stage owns the whole subscription.

What you'll have at the end

~/multi-stage-deploy/
├── infra/
│   ├── main.bicep                          # the workload
│   ├── modules/
│   │   ├── front-door.bicep                # canary traffic shaping
│   │   ├── function-app.bicep              # the workload itself
│   │   └── monitoring.bicep                # AppInsights + alerts
│   └── parameters/
│       ├── staging.bicepparam
│       └── prod.bicepparam
├── pipelines/
│   ├── azure-pipelines.yml                 # the orchestrator
│   ├── stages/
│   │   ├── build.yml
│   │   ├── what-if.yml
│   │   ├── deploy.yml
│   │   ├── smoke.yml
│   │   └── canary.yml
│   └── scripts/
│       ├── parse-whatif.sh                 # the gate logic
│       └── promote-canary.sh
└── README.md

Prerequisites

Skip ahead if you already have these. If anything is missing, the docs below cover the foundational setup faster than I could rewrite them:

A practical note on the approver list: in the production environment, do not list yourself as the only approver. Pipelines that can be approved by the same person who triggered them give you a lovely SOC 2 finding. Put at least two people on the list and turn on prevent_self_review.

Step 1: The Bicep workload

This walks through a representative workload — a Function App fronted by Azure Front Door. The exact resource shape doesn't matter; the pipeline pattern is identical for any Bicep that targets a resource group.

infra/modules/function-app.bicep:

// Inputs the pipeline will set per environment
@description('Function App name. Conventionally <app>-<env>-<region>.')
param appName string

@description('Azure region. Use eastus2 for staging, eastus for prod.')
param location string = resourceGroup().location

@description('Container image tag. Set to the build ID by the pipeline.')
param imageTag string

@description('Environment, dev or staging or prod. Drives SKU choice.')
@allowed([ 'dev', 'staging', 'prod' ])
param env string

// SKU table. Bigger machines for prod, cheap ones for staging.
// Centralising this here keeps environment differences explicit.
var skuByEnv = {
  dev:     { name: 'B1',  tier: 'Basic'    }
  staging: { name: 'P1V3', tier: 'PremiumV3' }
  prod:    { name: 'P2V3', tier: 'PremiumV3' }
}

resource plan 'Microsoft.Web/serverfarms@2024-04-01' = {
  name: 'plan-${appName}'
  location: location
  sku: skuByEnv[env]
  kind: 'linux'
  properties: {
    reserved: true   // required for Linux plans
  }
}

resource app 'Microsoft.Web/sites@2024-04-01' = {
  name: appName
  location: location
  kind: 'functionapp,linux,container'
  properties: {
    serverFarmId: plan.id
    httpsOnly: true
    siteConfig: {
      // Pin the runtime so a new model release does not surprise you
      linuxFxVersion: 'DOCKER|<your-acr>.azurecr.io/myfunc:${imageTag}'
      alwaysOn: env == 'prod'   // saves money in non-prod
      ftpsState: 'Disabled'
      minTlsVersion: '1.2'
    }
  }
  identity: {
    type: 'SystemAssigned'   // managed identity for downstream auth
  }
}

output appHostname string = app.properties.defaultHostName
output appResourceId string = app.id

A few comments on what's happening here, because each one is a deliberate choice:

  • param imageTag string is the input the pipeline sets to the build ID of the current run, which makes deploys traceable to a specific commit. Don't default this to latest; pinning by build is what makes rollbacks possible.
  • The skuByEnv map is how environment-specific differences stay visible in one place. The alternative is a fork of the entire Bicep per environment, which drifts within a quarter.
  • httpsOnly: true and minTlsVersion: '1.2' are the security defaults Azure Policy will enforce in any landing zone with a baseline initiative. Set them in the template so policy audits pass on first deploy, not on the third.
  • identity: { type: 'SystemAssigned' } provisions a managed identity. The Function App's code can authenticate to other Azure services without storing any keys. We'll use it in the canary smoke test.

Step 2: The Azure DevOps service connection setup

You'll create two service connections, one per environment. Each is bound to a different Entra ID app via workload identity federation; the pipeline never sees a secret.

The full setup is documented at Connect to Azure with workload identity federation. Follow that and create connections named azure-staging-sc and azure-prod-sc.

The non-obvious detail: the federated subject claim must include the environment, not just the repo. That looks like:

sub: sc://<azdo-org>/<azdo-project>/<connection-name>

When you create the connection in the Azure DevOps UI with workload identity federation, this is set automatically. Verify after creation that the subject in the federated credential of the Entra ID app matches what's in the Service Connection page in Azure DevOps. They drift occasionally on rename and the resulting auth failure is opaque.

After creation, grant each app the right RBAC:

# Staging app -> Contributor on rg-app-staging
az role assignment create \
  --assignee-object-id <staging-app-sp-objectid> \
  --assignee-principal-type ServicePrincipal \
  --role Contributor \
  --scope "/subscriptions/<sub>/resourceGroups/rg-app-staging"

# Prod app -> Contributor on rg-app-prod
az role assignment create \
  --assignee-object-id <prod-app-sp-objectid> \
  --assignee-principal-type ServicePrincipal \
  --role Contributor \
  --scope "/subscriptions/<sub>/resourceGroups/rg-app-prod"

Different scopes, different identities. The staging credential cannot deploy to prod. That's the point.

Step 3: The pipeline orchestrator

pipelines/azure-pipelines.yml:

# Trigger on main and on PRs that touch infra or pipelines.
trigger:
  branches:
    include: [ main ]

pr:
  branches:
    include: [ main ]
  paths:
    include:
      - infra/**
      - pipelines/**

# Variables shared across stages. Build ID is auto-generated.
variables:
  - name: appName
    value: myfunc
  - name: imageTag
    value: $(Build.BuildId)

# Five stages. Each is a separate file in pipelines/stages/ for readability.
stages:
  # 1. Build the container image and push to ACR.
  - template: stages/build.yml
    parameters:
      imageTag: $(imageTag)

  # 2. What-if against staging. Fail run on unexpected deletes.
  - template: stages/what-if.yml
    parameters:
      env: staging
      serviceConnection: azure-staging-sc
      resourceGroup: rg-app-staging
      imageTag: $(imageTag)

  # 3. Deploy to staging if what-if passed.
  - template: stages/deploy.yml
    parameters:
      env: staging
      serviceConnection: azure-staging-sc
      resourceGroup: rg-app-staging
      imageTag: $(imageTag)

  # 4. Smoke test the deployed staging app.
  - template: stages/smoke.yml
    parameters:
      env: staging
      serviceConnection: azure-staging-sc
      resourceGroup: rg-app-staging

  # 5. What-if + deploy to prod, gated by manual approval and canary check.
  - template: stages/what-if.yml
    parameters:
      env: production
      serviceConnection: azure-prod-sc
      resourceGroup: rg-app-prod
      imageTag: $(imageTag)

  - template: stages/canary.yml
    parameters:
      env: production
      serviceConnection: azure-prod-sc
      resourceGroup: rg-app-prod
      imageTag: $(imageTag)

Six stages strung together. Note the structure: I keep the orchestrator small and put each stage's body in its own file. That's not just neatness; it lets you reuse the what-if and deploy stages across multiple pipelines because they're parameterised templates. A second workload's pipeline file becomes 30 lines.

Step 4: The what-if gate

This is the single most-load-bearing piece of the build. azure-pipelines.yml calls stages/what-if.yml, which runs az deployment group what-if, parses the JSON, and fails the run if it sees a Delete on a resource not on the allowlist.

pipelines/stages/what-if.yml:

parameters:
  - name: env
    type: string
  - name: serviceConnection
    type: string
  - name: resourceGroup
    type: string
  - name: imageTag
    type: string

stages:
  - stage: WhatIf_${{ parameters.env }}
    displayName: 'What-if ${{ parameters.env }}'
    dependsOn: []
    jobs:
      - job: WhatIfJob
        displayName: Bicep What-If
        steps:
          - checkout: self

          - task: AzureCLI@2
            displayName: Run what-if and parse
            inputs:
              azureSubscription: ${{ parameters.serviceConnection }}
              scriptType: bash
              scriptLocation: inlineScript
              inlineScript: |
                set -euo pipefail

                # Run what-if, capture full payload as JSON
                az deployment group what-if \
                  --resource-group ${{ parameters.resourceGroup }} \
                  --template-file infra/main.bicep \
                  --parameters infra/parameters/${{ parameters.env }}.bicepparam \
                  --parameters imageTag=${{ parameters.imageTag }} \
                  --result-format FullResourcePayloads \
                  --no-pretty-print \
                  > whatif.json

                # Print human-readable summary to the pipeline log
                jq -r '.changes[]
                  | "  \(.changeType): \(.resourceId)"' whatif.json

                # Run the gate logic
                bash pipelines/scripts/parse-whatif.sh whatif.json

          - publish: whatif.json
            artifact: whatif-${{ parameters.env }}

pipelines/scripts/parse-whatif.sh:

#!/usr/bin/env bash
# Parse the what-if JSON output and fail the build if there is any Delete
# on a resource not on the allowlist. Echoes a summary to the pipeline log.
#
# Usage: parse-whatif.sh whatif.json

set -euo pipefail

INPUT="${1:?usage: parse-whatif.sh <whatif.json>}"

# Resources we accept being deleted. Add to this allowlist deliberately.
ALLOWLIST_REGEX='/Microsoft\.Web/sites/myfunc-staging/slots/canary-old

Two things this script enforces that ad-hoc what-if review does not:

  1. A delete on an unrecognised resource is a build failure. Not a comment, not a warning, a failure. Reviewers can choose to update the allowlist regex if a planned deletion is legitimate (a deprecated resource being removed); doing so is a code change reviewed by a teammate, not a click-through approval.
  2. The full what-if JSON is published as a build artefact. Audit trail for "what was the planned change for build 4827" lives in the pipeline run, not in someone's memory.

The --result-format FullResourcePayloads flag is the difference between useful what-if and useless what-if. The default ResourceIdOnly tells you a resource will change but not what about it, which makes the whole exercise hand-wavy. FullResourcePayloads gives you the property-level diff. For deletions specifically, full payloads also tell you which properties are causing the deletion (e.g., a name change being interpreted as delete-and-recreate).

Step 5: The deploy stage

pipelines/stages/deploy.yml:

parameters:
  - name: env
    type: string
  - name: serviceConnection
    type: string
  - name: resourceGroup
    type: string
  - name: imageTag
    type: string

stages:
  - stage: Deploy_${{ parameters.env }}
    displayName: 'Deploy ${{ parameters.env }}'
    dependsOn:
      - WhatIf_${{ parameters.env }}
    condition: succeeded()
    jobs:
      - deployment: DeployJob
        displayName: Bicep Deploy
        environment: ${{ parameters.env }}
        strategy:
          runOnce:
            deploy:
              steps:
                - checkout: self

                - task: AzureCLI@2
                  displayName: Deploy
                  inputs:
                    azureSubscription: ${{ parameters.serviceConnection }}
                    scriptType: bash
                    scriptLocation: inlineScript
                    inlineScript: |
                      set -euo pipefail
                      az deployment group create \
                        --resource-group ${{ parameters.resourceGroup }} \
                        --template-file infra/main.bicep \
                        --parameters infra/parameters/${{ parameters.env }}.bicepparam \
                        --parameters imageTag=${{ parameters.imageTag }}

The deployment job (instead of a regular job) is what activates the environment's approval and check policies. When parameters.env == "production", the run pauses for the manual approval you configured in the production environment in step 0.

A subtle but important detail: the deploy job uses condition: succeeded() so it only runs after the corresponding what-if stage passed. Without this, a what-if failure would still let the deploy run — the gate would be visual only.

Step 6: The smoke test stage

After the staging deploy succeeds, the smoke test stage hits /api/health on the new app and fails the run if it doesn't get a 200.

pipelines/stages/smoke.yml:

parameters:
  - name: env
    type: string
  - name: serviceConnection
    type: string
  - name: resourceGroup
    type: string

stages:
  - stage: Smoke_${{ parameters.env }}
    displayName: 'Smoke ${{ parameters.env }}'
    dependsOn:
      - Deploy_${{ parameters.env }}
    condition: succeeded()
    jobs:
      - job: SmokeJob
        steps:
          - task: AzureCLI@2
            displayName: Health check
            inputs:
              azureSubscription: ${{ parameters.serviceConnection }}
              scriptType: bash
              scriptLocation: inlineScript
              inlineScript: |
                set -euo pipefail
                FQDN=$(az functionapp show \
                  -g ${{ parameters.resourceGroup }} \
                  -n myfunc-${{ parameters.env }} \
                  --query defaultHostName -o tsv)

                # Retry 5 times, 6 seconds apart. Cold-start window.
                for i in 1 2 3 4 5; do
                  if curl -fsS "https://$FQDN/api/health" > /dev/null; then
                    echo "OK: /api/health returned 200"
                    exit 0
                  fi
                  echo "attempt $i failed, retrying in 6s"
                  sleep 6
                done

                echo "##vso[task.logissue type=error]Health check failed after 5 attempts"
                exit 1

Five attempts spaced six seconds apart equals 30 seconds of patience, which matches the typical cold-start window for a Linux container Function App. If the app isn't healthy after 30 seconds, something genuinely broke.

Be careful adding too many smoke tests here. The smoke is a sanity check, not an integration test suite. Real integration tests run on a separate cadence against a deployed-and-warm staging environment, not against the just-deployed-after-Bicep-change deploy. Mixing them inflates the deploy pipeline duration and trains the team to ignore smoke failures.

Step 7: Production canary with Azure Front Door

The canary stage is where this pattern earns its keep. It deploys the prod Bicep change, then routes 10% of Front Door traffic to the new version, watches App Insights for elevated error rates, and either promotes to 100% or rolls back.

pipelines/stages/canary.yml:

parameters:
  - name: env
    type: string
  - name: serviceConnection
    type: string
  - name: resourceGroup
    type: string
  - name: imageTag
    type: string

stages:
  - stage: Canary_${{ parameters.env }}
    displayName: 'Canary ${{ parameters.env }}'
    dependsOn:
      - WhatIf_${{ parameters.env }}
    condition: succeeded()
    jobs:
      # 1. Manual approval before any prod change happens.
      - deployment: ApprovalGate
        displayName: 'Approval'
        environment: ${{ parameters.env }}
        strategy:
          runOnce:
            deploy:
              steps:
                - bash: echo "Approval received. Proceeding."

      # 2. Deploy the prod Bicep change.
      - deployment: DeployProd
        displayName: 'Deploy prod'
        environment: ${{ parameters.env }}
        dependsOn: ApprovalGate
        strategy:
          runOnce:
            deploy:
              steps:
                - task: AzureCLI@2
                  inputs:
                    azureSubscription: ${{ parameters.serviceConnection }}
                    scriptType: bash
                    scriptLocation: inlineScript
                    inlineScript: |
                      az deployment group create \
                        --resource-group ${{ parameters.resourceGroup }} \
                        --template-file infra/main.bicep \
                        --parameters infra/parameters/prod.bicepparam \
                        --parameters imageTag=${{ parameters.imageTag }} \
                        --parameters canaryWeight=10

      # 3. Watch error rates for 10 minutes. If healthy, promote.
      - job: WatchAndPromote
        dependsOn: DeployProd
        steps:
          - task: AzureCLI@2
            displayName: Health watch
            inputs:
              azureSubscription: ${{ parameters.serviceConnection }}
              scriptType: bash
              scriptLocation: inlineScript
              inlineScript: |
                set -euo pipefail
                bash pipelines/scripts/promote-canary.sh \
                  ${{ parameters.resourceGroup }} \
                  myfunc-prod

pipelines/scripts/promote-canary.sh:

#!/usr/bin/env bash
# Watch canary error rate. If healthy after the watch window, ramp to 100%.
# If unhealthy, restore to 0% and exit non-zero.
#
# Usage: promote-canary.sh <resource-group> <app-name>

set -euo pipefail

RG="${1:?missing resource group}"
APP="${2:?missing app name}"
WATCH_MINUTES="${WATCH_MINUTES:-10}"
ERROR_THRESHOLD="${ERROR_THRESHOLD:-0.02}"   # 2% error rate
APPINSIGHTS="${APPINSIGHTS:-ai-myfunc-prod}"

echo "Watching canary for $WATCH_MINUTES minutes..."
sleep $((WATCH_MINUTES * 60))

# Query AppInsights for canary error rate over the watch window
KQL=$(cat <<EOF
requests
| where timestamp > ago(${WATCH_MINUTES}m)
| where cloud_RoleInstance contains "canary"
| summarize
    total = count(),
    errors = countif(success == false)
| extend rate = todouble(errors) / total
| project rate
EOF
)

RATE=$(az monitor app-insights query \
  --apps "$APPINSIGHTS" \
  --analytics-query "$KQL" \
  --query 'tables[0].rows[0][0]' -o tsv)

# 'null' means no requests landed; treat as healthy if no errors but flag.
if [ "$RATE" = "null" ] || [ -z "$RATE" ]; then
  echo "WARN: no canary traffic observed in window; promoting anyway."
  RATE=0
fi

echo "Canary error rate: $RATE (threshold: $ERROR_THRESHOLD)"

if (( $(echo "$RATE > $ERROR_THRESHOLD" | bc -l) )); then
  echo "##vso[task.logissue type=error]Canary error rate exceeded threshold; rolling back."
  az deployment group create \
    --resource-group "$RG" \
    --template-file infra/main.bicep \
    --parameters infra/parameters/prod.bicepparam \
    --parameters canaryWeight=0
  exit 1
fi

echo "Canary healthy. Promoting to 100%."
az deployment group create \
  --resource-group "$RG" \
  --template-file infra/main.bicep \
  --parameters infra/parameters/prod.bicepparam \
  --parameters canaryWeight=100

The 2% error-rate threshold is an opinion. Tune it to your service-level objective. Most teams I've worked with start at 5% (generous), tighten to 2% after a quarter of stable canaries, and never go below 1% because at 1% you're catching natural noise.

The 10-minute watch window is also opinion. Long enough that real failures surface; short enough that the pipeline doesn't sit waiting forever. If your service is so quiet that 10 minutes doesn't accumulate enough requests for a meaningful error rate, the canary pattern is probably wrong for the workload — you should be canarying based on time-since-deploy and synthetic traffic, not real traffic.

Step 8: The Front Door traffic-split Bicep

infra/modules/front-door.bicep (excerpt, the canary-relevant part):

@description('Weight assigned to the canary backend, 0 to 100.')
param canaryWeight int = 0

resource frontDoor 'Microsoft.Cdn/profiles@2024-09-01' = {
  // ... full Front Door profile ...
}

resource backendPool 'Microsoft.Cdn/profiles/originGroups@2024-09-01' = {
  parent: frontDoor
  name: 'pool-myfunc'
  properties: {
    loadBalancingSettings: {
      sampleSize: 4
      successfulSamplesRequired: 3
      additionalLatencyInMilliseconds: 50
    }
  }
}

// Stable backend (always 100% minus the canary weight)
resource origin 'Microsoft.Cdn/profiles/originGroups/origins@2024-09-01' = {
  parent: backendPool
  name: 'origin-stable'
  properties: {
    hostName: 'myfunc-stable.azurewebsites.net'
    httpPort: 80
    httpsPort: 443
    weight: 100 - canaryWeight
    enabled: true
  }
}

// Canary backend, weighted by the parameter
resource canaryOrigin 'Microsoft.Cdn/profiles/originGroups/origins@2024-09-01' = {
  parent: backendPool
  name: 'origin-canary'
  properties: {
    hostName: 'myfunc-canary.azurewebsites.net'
    httpPort: 80
    httpsPort: 443
    weight: canaryWeight
    enabled: true
  }
}

Two Function App slots (or two separate apps), with Front Door routing weighted between them. When the canary deploy sets canaryWeight=10, 10% of incoming requests land on the new version. When the watch script sets canaryWeight=100, all traffic flows through the new version. Stable becomes the previous deploy's app, and the next canary will replace it.

Front Door's weight is not a strict percentage but a probabilistic split based on hash. At low traffic volumes the actual ratio jitters; at high volume it converges on the configured weights. For canaries, this jitter is acceptable; for cost-sensitive blue/green, switch to explicit traffic management.

Production checklist

  1. Pin all action and CLI versions. Use azure/setup-cli@v1.x style pins, not floats. A surprise CLI bump on a deploy day is the kind of thing that costs you a Saturday.

  2. Save the what-if JSON for every prod deploy. The whatif-production pipeline artefact is your audit log; configure pipeline retention so it keeps for at least a year.

  3. Set the production environment timer to 5+ minutes. The wait timer is the buffer between an approval click and the deploy actually running, which gives someone time to retract approval if they realise something is wrong post-click.

  4. Watch for AppInsights ingestion lag. The 10-minute watch window assumes telemetry is queryable within that window. AppInsights has up to ~5 minutes of ingestion lag at peak. If your error-detection happens at the very end of the window, you may see a fully-failed canary as zero-error because the data hasn't landed yet. Bump the watch window to 15 minutes for high-volume workloads.

  5. Document the [skip canary] escape hatch. Sometimes a hotfix cannot wait for canary watch. Add a workflow trigger that skips the canary stage if the commit message contains [skip canary], but require a senior engineer's commit signature for that path. The escape hatch must exist and must be controlled.

  6. Test the rollback path quarterly. Trigger an intentional failure (a smoke test that always returns 500), confirm the watch script rolls back, confirm Front Door restores to 100% stable. Most teams build rollback machinery and never test it; the test is the point.

Troubleshooting

What-if shows 'Modify' on resources you didn't change. Usually the Bicep emitter is producing slightly different ARM than the previous deploy (different parameter ordering, different tag normalisation, etc.). Run bicep build locally and diff against the prior run; the difference is in the emitter, not your changes.

Deployment fails with 'AnotherOperationInProgress'. Concurrent deploys collided. The pipeline's concurrency block should prevent this; if it's happening, two pipelines (e.g., a hotfix on a branch) are racing. Add concurrency: { group: deploy-${{ parameters.env }}-${{ github.ref }} } at the workflow level.

Canary watch script reports 'no canary traffic observed'. The Front Door routing rule is not matching the canary backend's domain. Verify with az afd origin show that weight on the canary origin is non-zero.

Federated identity returns 'AADSTS70021'. The subject claim on the OIDC token doesn't match the Entra ID app's federated credential. The most common cause for ADO is renaming the service connection without re-creating the federation, which leaves the credential's subject pinned to the old name.

Smoke test passes but canary errors immediately. Cold start, almost certainly. Increase the smoke-test retry window, or add a WarmupSeconds parameter to the deploy stage that calls the new app once before the canary is enabled.

Real-world references

The Lannoeye posts are the closest community references to the pattern in this article and worth reading after this one. The Dennyson piece on Medium is more application-focused and lighter on the gate logic; useful for the Front Door side of the story.

What this gives you, after the first quarter of operation

The obvious wins are visible in the deploy pipeline itself. Every prod change is gated by what-if; every prod change has an approval log; every prod change has a canary; every prod change has an automatic rollback path. The pre-build version of the pipeline shipped one or two production incidents per quarter. The gated version has shipped zero in five quarters of operation.

The less obvious win is the cultural one. Bicep changes used to be reviewed with a quick eye-scan and approved on trust; now they're reviewed against the what-if output, which makes the review faster and more confident. The team's senior engineers stopped being the bottleneck for "is this Bicep change safe", because the pipeline is the bottleneck. Senior review now focuses on architecture changes, not delete-detection.

The far-out win, paid in months, is the audit conversation. SOC 2 reviewers have asked us "show me how a production change is approved" three times. The first answer involved several documents and a screen-share. The current answer is "open the pipeline run, point at the approval log, point at the what-if artefact, point at the canary watch result." It takes ninety seconds. The pipeline is its own audit log.

Six quarters in, the team has shipped 412 production deploys through this pipeline. Two were rolled back automatically by the canary watch (one was a regression in the Bicep that broke an output binding; one was a 503 spike from a downstream dependency unrelated to the deploy itself). The rollback worked correctly both times. That's the deliverable in one sentence.

# Collect all delete change-types DELETIONS=$(jq -r '.changes[] | select(.changeType == "Delete") | .resourceId' "$INPUT") if [ -z "$DELETIONS" ]; then echo "OK: no resource deletions in this deploy." exit 0 fi # Filter against allowlist UNEXPECTED=$(echo "$DELETIONS" | grep -vE "$ALLOWLIST_REGEX" || true) if [ -n "$UNEXPECTED" ]; then echo "##vso[task.logissue type=error]What-if would delete resources not on the allowlist:" echo "$UNEXPECTED" | sed 's/^/ /' echo "##vso[task.complete result=Failed;]" exit 1 fi echo "OK: all deletes are on the allowlist." echo "Allowlisted deletions in this run:" echo "$DELETIONS" | sed 's/^/ /'

Two things this script enforces that ad-hoc what-if review does not:

  1. A delete on an unrecognised resource is a build failure. Not a comment, not a warning, a failure. Reviewers can choose to update the allowlist regex if a planned deletion is legitimate (a deprecated resource being removed); doing so is a code change reviewed by a teammate, not a click-through approval.
  2. The full what-if JSON is published as a build artefact. Audit trail for "what was the planned change for build 4827" lives in the pipeline run, not in someone's memory.

The --result-format FullResourcePayloads flag is the difference between useful what-if and useless what-if. The default ResourceIdOnly tells you a resource will change but not what about it, which makes the whole exercise hand-wavy. FullResourcePayloads gives you the property-level diff. For deletions specifically, full payloads also tell you which properties are causing the deletion (e.g., a name change being interpreted as delete-and-recreate).

Step 5: The deploy stage

pipelines/stages/deploy.yml:

@@SHIKI_BLOCK_7@@

The deployment job (instead of a regular job) is what activates the environment's approval and check policies. When parameters.env == "production", the run pauses for the manual approval you configured in the production environment in step 0.

A subtle but important detail: the deploy job uses condition: succeeded() so it only runs after the corresponding what-if stage passed. Without this, a what-if failure would still let the deploy run — the gate would be visual only.

Step 6: The smoke test stage

After the staging deploy succeeds, the smoke test stage hits /api/health on the new app and fails the run if it doesn't get a 200.

pipelines/stages/smoke.yml:

@@SHIKI_BLOCK_8@@

Five attempts spaced six seconds apart equals 30 seconds of patience, which matches the typical cold-start window for a Linux container Function App. If the app isn't healthy after 30 seconds, something genuinely broke.

Be careful adding too many smoke tests here. The smoke is a sanity check, not an integration test suite. Real integration tests run on a separate cadence against a deployed-and-warm staging environment, not against the just-deployed-after-Bicep-change deploy. Mixing them inflates the deploy pipeline duration and trains the team to ignore smoke failures.

Step 7: Production canary with Azure Front Door

The canary stage is where this pattern earns its keep. It deploys the prod Bicep change, then routes 10% of Front Door traffic to the new version, watches App Insights for elevated error rates, and either promotes to 100% or rolls back.

pipelines/stages/canary.yml:

@@SHIKI_BLOCK_9@@

pipelines/scripts/promote-canary.sh:

@@SHIKI_BLOCK_10@@

The 2% error-rate threshold is an opinion. Tune it to your service-level objective. Most teams I've worked with start at 5% (generous), tighten to 2% after a quarter of stable canaries, and never go below 1% because at 1% you're catching natural noise.

The 10-minute watch window is also opinion. Long enough that real failures surface; short enough that the pipeline doesn't sit waiting forever. If your service is so quiet that 10 minutes doesn't accumulate enough requests for a meaningful error rate, the canary pattern is probably wrong for the workload — you should be canarying based on time-since-deploy and synthetic traffic, not real traffic.

Step 8: The Front Door traffic-split Bicep

infra/modules/front-door.bicep (excerpt, the canary-relevant part):

@@SHIKI_BLOCK_11@@

Two Function App slots (or two separate apps), with Front Door routing weighted between them. When the canary deploy sets canaryWeight=10, 10% of incoming requests land on the new version. When the watch script sets canaryWeight=100, all traffic flows through the new version. Stable becomes the previous deploy's app, and the next canary will replace it.

Front Door's weight is not a strict percentage but a probabilistic split based on hash. At low traffic volumes the actual ratio jitters; at high volume it converges on the configured weights. For canaries, this jitter is acceptable; for cost-sensitive blue/green, switch to explicit traffic management.

Production checklist

  1. Pin all action and CLI versions. Use azure/setup-cli@v1.x style pins, not floats. A surprise CLI bump on a deploy day is the kind of thing that costs you a Saturday.

  2. Save the what-if JSON for every prod deploy. The whatif-production pipeline artefact is your audit log; configure pipeline retention so it keeps for at least a year.

  3. Set the production environment timer to 5+ minutes. The wait timer is the buffer between an approval click and the deploy actually running, which gives someone time to retract approval if they realise something is wrong post-click.

  4. Watch for AppInsights ingestion lag. The 10-minute watch window assumes telemetry is queryable within that window. AppInsights has up to ~5 minutes of ingestion lag at peak. If your error-detection happens at the very end of the window, you may see a fully-failed canary as zero-error because the data hasn't landed yet. Bump the watch window to 15 minutes for high-volume workloads.

  5. Document the [skip canary] escape hatch. Sometimes a hotfix cannot wait for canary watch. Add a workflow trigger that skips the canary stage if the commit message contains [skip canary], but require a senior engineer's commit signature for that path. The escape hatch must exist and must be controlled.

  6. Test the rollback path quarterly. Trigger an intentional failure (a smoke test that always returns 500), confirm the watch script rolls back, confirm Front Door restores to 100% stable. Most teams build rollback machinery and never test it; the test is the point.

Troubleshooting

What-if shows 'Modify' on resources you didn't change. Usually the Bicep emitter is producing slightly different ARM than the previous deploy (different parameter ordering, different tag normalisation, etc.). Run bicep build locally and diff against the prior run; the difference is in the emitter, not your changes.

Deployment fails with 'AnotherOperationInProgress'. Concurrent deploys collided. The pipeline's concurrency block should prevent this; if it's happening, two pipelines (e.g., a hotfix on a branch) are racing. Add concurrency: { group: deploy-${{ parameters.env }}-${{ github.ref }} } at the workflow level.

Canary watch script reports 'no canary traffic observed'. The Front Door routing rule is not matching the canary backend's domain. Verify with az afd origin show that weight on the canary origin is non-zero.

Federated identity returns 'AADSTS70021'. The subject claim on the OIDC token doesn't match the Entra ID app's federated credential. The most common cause for ADO is renaming the service connection without re-creating the federation, which leaves the credential's subject pinned to the old name.

Smoke test passes but canary errors immediately. Cold start, almost certainly. Increase the smoke-test retry window, or add a WarmupSeconds parameter to the deploy stage that calls the new app once before the canary is enabled.

Real-world references

The Lannoeye posts are the closest community references to the pattern in this article and worth reading after this one. The Dennyson piece on Medium is more application-focused and lighter on the gate logic; useful for the Front Door side of the story.

What this gives you, after the first quarter of operation

The obvious wins are visible in the deploy pipeline itself. Every prod change is gated by what-if; every prod change has an approval log; every prod change has a canary; every prod change has an automatic rollback path. The pre-build version of the pipeline shipped one or two production incidents per quarter. The gated version has shipped zero in five quarters of operation.

The less obvious win is the cultural one. Bicep changes used to be reviewed with a quick eye-scan and approved on trust; now they're reviewed against the what-if output, which makes the review faster and more confident. The team's senior engineers stopped being the bottleneck for "is this Bicep change safe", because the pipeline is the bottleneck. Senior review now focuses on architecture changes, not delete-detection.

The far-out win, paid in months, is the audit conversation. SOC 2 reviewers have asked us "show me how a production change is approved" three times. The first answer involved several documents and a screen-share. The current answer is "open the pipeline run, point at the approval log, point at the what-if artefact, point at the canary watch result." It takes ninety seconds. The pipeline is its own audit log.

Six quarters in, the team has shipped 412 production deploys through this pipeline. Two were rolled back automatically by the canary watch (one was a regression in the Bicep that broke an output binding; one was a 503 spike from a downstream dependency unrelated to the deploy itself). The rollback worked correctly both times. That's the deliverable in one sentence.

Azure PipelinesBicepWhat-IfCanaryFront Door

Conversation

Reactions & comments

Liked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.

Loading conversation…

More from DevOps

See all →