Skip to content
damionas
No. 40DevOpsOct 8, 202524 min read

Replace Every Service Principal Secret With OIDC Federation: A Multi-Environment Walkthrough

I once got paged at 4am because a service principal secret expired in the middle of a release. The deploy succeeded for staging, then the production stage tried to authenticate, the SP credential had hit its 30-day TTL three minut…

I once got paged at 4am because a service principal secret expired in the middle of a release. The deploy succeeded for staging, then the production stage tried to authenticate, the SP credential had hit its 30-day TTL three minutes earlier, the deploy halted, the on-call rotation was me, and I spent the next 90 minutes rotating a credential I shouldn't have been managing in the first place.

That credential, and the fourteen others like it scattered across our org's repos, were the visible artefact of an authentication model that had been wrong for years. Every quarter, someone rotated. Every quarter, someone forgot a repo or pasted a secret into a Slack DM. Every quarter, the security team filed a ticket with us about the same thing.

The fix is OIDC federation. Workflow-scoped, no shared secrets, zero rotation calendar, audit-friendly. Once you do it, you don't go back. This post is the entire migration: one Entra ID app per repository, three federated credentials (PR plan, staging, production), three role assignments at the right scopes, a workflow with environment gates, and an audit-evidence script that produces a printable report in 60 seconds.

Total identity sprawl: 1 app, 3 federated creds, 3 role assignments. Total long-lived secrets: 0. Total time the on-call rotation spends thinking about credential rotation per year: 0.

What you'll have at the end

~/oidc-deploy-template/
├── infra/
│   ├── identity.bicep
│   ├── rbac.bicep
│   └── identity.bicepparam
├── .github/
│   └── workflows/
│       ├── pr-plan.yml
│       └── deploy.yml
├── scripts/
│   ├── show-claims.sh
│   └── audit-evidence.sh
└── README.md

Why this isn't optional anymore

Quick aside because there's still a defensible-sounding case for keeping SP secrets, and I want to address it.

The case: "OIDC requires GitHub. We're not married to GitHub. If we ever switch to GitLab or self-hosted, OIDC won't work the same way."

Two problems with this. First, GitLab and Azure DevOps both support OIDC federation now; the issuer URL changes, but the pattern is identical. Second, the cost of switching CI platforms is dominated by everything except the credential pattern. If you're switching CIs, refactoring your federated credentials is the easy part of the migration.

The actual reason teams keep SP secrets is that they were set up before OIDC was viable, nobody's gone back to migrate, and rotation is "fine, we have a process for it". That process is the cost. It's just been amortised across enough quarters that you stopped counting.

Prerequisites

az --version                # 2.65+
gh --version                # 2.50+
az bicep version            # 0.30+

You'll need:

  • A GitHub repository to install on (we'll use dammyboss/oidc-template)
  • An Azure subscription where you can create app registrations, role assignments at subscription + RG scope
  • Two resource groups already created, rg-staging and rg-prod, or willingness to create them in this tutorial

az login
SUB=$(az account show --query id -o tsv)
TENANT=$(az account show --query tenantId -o tsv)
RG_STAGING=rg-staging
RG_PROD=rg-prod

az group create -n $RG_STAGING -l eastus
az group create -n $RG_PROD    -l eastus

The two-RG split is the smallest setup that demonstrates the multi-environment pattern. Real orgs have three or four (dev, staging, UAT, prod). The Bicep generalises easily; once two work, adding a third is a copy-paste plus a new federated credential.

Step 1: Provision the app + federated creds in Bicep

infra/identity.bicep:

extension microsoftGraphV1
targetScope = 'subscription'

param appDisplayName string
param repoSubject string
param resourceGroupStaging string
param resourceGroupProd string

resource app 'Microsoft.Graph/applications@v1.0' = {
  uniqueName: appDisplayName
  displayName: appDisplayName
}

resource sp 'Microsoft.Graph/servicePrincipals@v1.0' = {
  appId: app.appId
}

// Federated credential: PR plan (read-only what-if)
resource fedPr 'Microsoft.Graph/applications/federatedIdentityCredentials@v1.0' = {
  parent: app
  name: 'github-pr'
  properties: {
    issuer: 'https://token.actions.githubusercontent.com'
    subject: '${repoSubject}:pull_request'
    audiences: ['api://AzureADTokenExchange']
  }
}

// Federated credential: staging environment deploys
resource fedStaging 'Microsoft.Graph/applications/federatedIdentityCredentials@v1.0' = {
  parent: app
  name: 'github-staging'
  properties: {
    issuer: 'https://token.actions.githubusercontent.com'
    subject: '${repoSubject}:environment:staging'
    audiences: ['api://AzureADTokenExchange']
  }
}

// Federated credential: production environment deploys
resource fedProd 'Microsoft.Graph/applications/federatedIdentityCredentials@v1.0' = {
  parent: app
  name: 'github-production'
  properties: {
    issuer: 'https://token.actions.githubusercontent.com'
    subject: '${repoSubject}:environment:production'
    audiences: ['api://AzureADTokenExchange']
  }
}

// Role assignments
module rbacReaderSub 'rbac.bicep' = {
  name: 'rbac-reader-sub'
  scope: subscription()
  params: {
    principalId: sp.id
    roleDefinitionId: subscriptionResourceId(
      'Microsoft.Authorization/roleDefinitions',
      'acdd72a7-3385-48ef-bd42-f606fba81ae7') // Reader
  }
}

module rbacContribStaging 'rbac.bicep' = {
  name: 'rbac-contrib-staging'
  scope: resourceGroup(resourceGroupStaging)
  params: {
    principalId: sp.id
    roleDefinitionId: subscriptionResourceId(
      'Microsoft.Authorization/roleDefinitions',
      'b24988ac-6180-42a0-ab88-20f7382dd24c') // Contributor
  }
}

module rbacContribProd 'rbac.bicep' = {
  name: 'rbac-contrib-prod'
  scope: resourceGroup(resourceGroupProd)
  params: {
    principalId: sp.id
    roleDefinitionId: subscriptionResourceId(
      'Microsoft.Authorization/roleDefinitions',
      'b24988ac-6180-42a0-ab88-20f7382dd24c')
  }
}

output appId string = app.appId
output principalId string = sp.id
output tenantId string = tenant().tenantId

infra/rbac.bicep:

param principalId string
param roleDefinitionId string

resource ra 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(resourceGroup().id, principalId, roleDefinitionId)
  scope: resourceGroup()
  properties: {
    principalId: principalId
    principalType: 'ServicePrincipal'
    roleDefinitionId: roleDefinitionId
  }
}

The asymmetry in this template is the security model in plain sight. The PR identity gets Reader at subscription scope, because what-if needs to read across resource groups. The staging identity gets Contributor at the staging RG scope, because it can deploy to staging only. The production identity gets Contributor at the production RG scope, and not anywhere else.

Three federated credentials, three different RBAC scopes. A token minted via pull_request cannot deploy. A token minted via environment: staging cannot touch production. A token minted via environment: production cannot read across the subscription. The blast radius of any one token is bounded by what its credential can do.

This is the part that matters in an audit. Auditors don't want to hear "the bot is trustworthy"; they want to see that the system enforces the boundary regardless of what the bot tries. The federated credential subject + the scoped RBAC + the GitHub environment protection rules together do that. None of the three components alone is sufficient.

infra/identity.bicepparam:

using 'identity.bicep'

param appDisplayName = 'github-oidc-template'
param repoSubject = 'repo:dammyboss/oidc-template'
param resourceGroupStaging = 'rg-staging'
param resourceGroupProd = 'rg-prod'

Step 2: Deploy

az deployment sub create \
  --location eastus \
  --template-file infra/identity.bicep \
  --parameters infra/identity.bicepparam \
  --query 'properties.outputs'

Capture the outputs:

APP_ID=$(az ad app list --display-name 'github-oidc-template' --query '[0].appId' -o tsv)
echo "APP_ID=$APP_ID"

Step 3: Set GitHub repo variables

gh repo set-default dammyboss/oidc-template

gh variable set AZURE_CLIENT_ID       --body "$APP_ID"
gh variable set AZURE_TENANT_ID       --body "$TENANT"
gh variable set AZURE_SUBSCRIPTION_ID --body "$SUB"
gh variable set RESOURCE_GROUP_STAGING --body "$RG_STAGING"
gh variable set RESOURCE_GROUP_PROD    --body "$RG_PROD"

Now configure GitHub environments, staging and production, so the deploy workflow can require approval and use environment-scoped variables. The protection rules are what give the federated credential its meaning.

gh api -X PUT "repos/dammyboss/oidc-template/environments/staging" \
  -f wait_timer=0

gh api -X PUT "repos/dammyboss/oidc-template/environments/production" \
  -f wait_timer=300 \
  -F reviewers='[{"type":"User","id":<your-user-id>}]'

The production environment requires a 5-minute wait timer plus a reviewer. Without these, the federated credential is just an authentication mechanism; with them, it's a full audit chain. A federated credential with no environment protection rules is a regression from a good SP setup. Don't ship federation without the protection.

Step 4: The PR plan workflow

.github/workflows/pr-plan.yml:

name: pr-plan
on:
  pull_request:
    paths: ['infra/**']

permissions:
  id-token: write
  contents: read
  pull-requests: write

jobs:
  what-if:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          client-id:       ${{ vars.AZURE_CLIENT_ID }}
          tenant-id:       ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}

      - name: What-if against staging RG
        id: whatif
        run: |
          set -o pipefail
          az deployment group what-if \
            --resource-group ${{ vars.RESOURCE_GROUP_STAGING }} \
            --template-file infra/main.bicep \
            --parameters infra/main.bicepparam \
            --result-format FullResourcePayloads \
            --no-pretty-print > whatif.json
          jq -r '.changes[] | "  - \(.changeType): \(.resourceId)"' whatif.json > whatif.md
          echo "summary<<EOF" >> $GITHUB_OUTPUT
          cat whatif.md >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: Comment on PR
        uses: marocchino/sticky-pull-request-comment@v2
        with:
          message: |
            ### Bicep what-if (staging)
            ${{ steps.whatif.outputs.summary }}

This workflow runs against the PR plan federated credential, which has Reader at subscription scope. It can run what-if; it can't deploy. That's the security model.

The marocchino/sticky-pull-request-comment action is a small but high-leverage choice. It updates the existing what-if comment on each push to the PR rather than stacking new ones. The reviewer sees a single comment that always reflects the current head; the comment thread doesn't get cluttered by 30 stale what-ifs from a force-pushed branch. Same idempotent-comment pattern that matters for the Bicep PR reviewer bot.

Step 5: The deploy workflow with environment gates

.github/workflows/deploy.yml:

name: deploy
on:
  push:
    branches: [main]
  workflow_dispatch:

permissions:
  id-token: write
  contents: read

concurrency:
  group: deploy-${{ github.ref }}
  cancel-in-progress: false

jobs:
  deploy-staging:
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v2
        with:
          client-id:       ${{ vars.AZURE_CLIENT_ID }}
          tenant-id:       ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
      - run: |
          az deployment group create \
            --resource-group ${{ vars.RESOURCE_GROUP_STAGING }} \
            --template-file infra/main.bicep \
            --parameters infra/main.bicepparam

  smoke-staging:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - run: ./scripts/smoke.sh "staging"

  deploy-production:
    needs: smoke-staging
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v2
        with:
          client-id:       ${{ vars.AZURE_CLIENT_ID }}
          tenant-id:       ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
      - run: |
          az deployment group create \
            --resource-group ${{ vars.RESOURCE_GROUP_PROD }} \
            --template-file infra/main.bicep \
            --parameters infra/main.bicepparam

The environment keyword in each job is what makes the OIDC token's sub claim match the federated credential. environment: production makes the token's sub equal repo:dammyboss/oidc-template:environment:production, which only matches the github-production federated credential, which is the one with Contributor on the prod RG.

A typo in the environment name equals a silent 401. The token-claim debugging step in Step 7 is what saves you here, and it's worth adding as a permanent fixture in your workflow's failure path.

concurrency.cancel-in-progress: false is deliberate. For deploys, you want the current deploy to finish before the next one starts. Cancelling mid-deploy can leave Azure in a partially-applied state. If two deploys race for the same RG, queue them, don't kill them.

Step 6: The token-claim debugging step

scripts/show-claims.sh:

#!/usr/bin/env bash
# Decode the OIDC token GitHub Actions issued for this workflow.
# Paste this as a workflow step when an Azure login fails.

set -euo pipefail

if [ -z "${ACTIONS_ID_TOKEN_REQUEST_TOKEN:-}" ]; then
  echo "must run inside a GitHub Actions job with id-token: write"
  exit 1
fi

TOKEN=$(curl -fsS \
  -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
  "$ACTIONS_ID_TOKEN_REQUEST_URL&audience=api://AzureADTokenExchange" \
  | jq -r '.value')

echo "$TOKEN" \
  | cut -d. -f2 \
  | base64 -d 2>/dev/null \
  | jq '{ sub, aud, iss, repository, environment, ref, job_workflow_ref }'

Add as a workflow step:

- name: Show OIDC claims
  if: failure()
  run: ./scripts/show-claims.sh

When a login fails, the next step prints the actual sub claim. Compare it to your federated credential's subject. They must match exactly.

Common mismatches (these are the four I've debugged most often):

  • repo:OWNER/REPO:ref:refs/heads/main (push trigger, no environment) vs repo:OWNER/REPO:environment:production (the credential): add environment: production to the job.
  • repo:OWNER/REPO:pull_request (PR trigger) vs repo:OWNER/REPO:environment:staging: the PR job tried to use a staging-only credential. Either change the credential to match pull_request or change the workflow to use the right context.
  • Subject case mismatch (e.g. Production vs production). Federated credentials are case-sensitive; align them.
  • Wrong tenant. The audience matches but the issuer's tenant ID is for a different Entra tenant than the federated credential lives in.

Step 7: Audit evidence script

scripts/audit-evidence.sh:

#!/usr/bin/env bash
# Generate a Markdown report that answers: "show me the deploy credentials".

set -euo pipefail

REPO="${REPO:-dammyboss/oidc-template}"
APP_DISPLAY_NAME="github-oidc-template"

APP_ID=$(az ad app list --display-name "$APP_DISPLAY_NAME" --query '[0].appId' -o tsv)
SP_OBJ=$(az ad sp show --id "$APP_ID" --query id -o tsv)

echo "# Deploy credentials report, $REPO"
echo
echo "_Generated $(date -u +'%Y-%m-%d %H:%M:%SZ')_"
echo
echo "## Application"
echo "- Display name: \`$APP_DISPLAY_NAME\`"
echo "- App ID: \`$APP_ID\`"
echo "- Service principal object ID: \`$SP_OBJ\`"
echo "- Long-lived secrets: **NONE** (federated only)"
echo
echo "## Federated credentials"
echo
echo "| Name | Subject | Issuer |"
echo "| --- | --- | --- |"
az ad app federated-credential list --id "$APP_ID" \
  --query '[].{n:name, s:subject, i:issuer}' -o tsv \
  | while IFS=

chmod +x scripts/audit-evidence.sh
./scripts/audit-evidence.sh > audit-evidence.md
pandoc audit-evidence.md -o audit-evidence.pdf

That report is what you hand the auditor. Three sections: application, federated credentials, role assignments. Plus a fourth that proves there are no long-lived secrets. Sixty seconds to generate, ten seconds to review.

In SOC 2 and ISO 27001 audits I've sat through, the "show me how production deploy credentials are managed" question used to involve pulling up a Key Vault, narrating the rotation policy, showing the most recent rotation log, defending the rotation cadence, and answering "who has access to read the secrets". With this report, the answer is "there are no secrets; the federation contract is in this Bicep file; here's the report". Twenty minutes of audit conversation drops to two.

Step 8: Verify each path works

Open a PR with a Bicep change:

git checkout -b test-oidc
echo "// trivial change" >> infra/main.bicep
git add . && git commit -m "test: PR plan path"
git push -u origin test-oidc
gh pr create --title "Test OIDC PR plan" --body "should run what-if only"

Watch:

gh run watch

The job uses the pull_request federated credential, which maps to Reader at subscription scope. It can run what-if; if you tried to deploy from this token you'd get a 403.

Merge to main:

gh pr merge --squash
gh run watch

The deploy job uses environment:staging then environment:production. The production deploy will pause for the wait timer plus reviewer approval. Try it without the approval and you'll see the workflow stuck on "Waiting for review."

Step 9: What happens if someone tries to break the model

A common audit question: "what if a developer modifies the workflow to skip approval?" The answer:

git checkout -b malicious
sed -i '' 's/environment: production//' .github/workflows/deploy.yml
git push -u origin malicious
gh pr create

When that PR is merged, the deploy job runs without the production environment context. The token's sub is repo:OWNER/REPO:ref:refs/heads/main, which doesn't match any of the three federated credentials. azure/login@v2 returns:

AADSTS70021: No matching federated identity record found for presented assertion.

The deploy fails closed. The malicious PR can no longer use the credential. The federated credentials are the security boundary, not the workflow file. This is the property that makes federation worth more than the SP-secret model: the boundary is enforced by Entra ID, not by the workflow being correctly written.

Production checklist

  1. Pin actions to commit SHAs, not tags. azure/login@v2 becomes azure/login@<sha>. Prevents a supply-chain compromise of a maintained action from gaining your tokens.
  2. Restrict environment reviewers to a small group. The production environment's reviewer list is the human side of the security model.
  3. Add a prevent_self_review rule on the production environment (GitHub Enterprise). The person who opens the PR can't be the one who approves its production deploy.
  4. Rotate federated credentials annually. Even though they're not secrets, a yearly review forces someone to look at the subject claims and confirm they're still right.
  5. Run the audit-evidence script in CI. Schedule it weekly, store the output in your evidence repo. Auditors love a paper trail with timestamps.

Troubleshooting

AADSTS70021: No matching federated identity record, Token's sub claim doesn't match any federated credential. Run the show-claims step, compare to the credentials list (az ad app federated-credential list --id $APP_ID).

AADSTS50020: User account does not exist in tenant, You're using a wrong-tenant token. The PR is targeting a workflow with a tenant ID that doesn't match where the app lives. Confirm vars.AZURE_TENANT_ID is the right tenant.

Insufficient privileges to complete the operation, The federated identity authenticated, but the role assignment is wrong. Run az role assignment list --assignee $APP_ID --all to see what the SP actually has.

Environment 'production' is not configured for protection rules, The environment exists but has no reviewers. Add at least one in repo settings or via API.

The given key 'production' was not present in the dictionary (PowerShell), The job didn't have environment: production, but the workflow expected it. Add the environment keyword to the job.

What this gives you

You have a deploy pipeline whose security boundary is enforced by Entra ID and Azure RBAC, not by your workflow file being correct. Three federated credentials, three role scopes, two GitHub environments with protection rules. Zero long-lived secrets, zero rotation calendar, zero late-night pages from credential expiry.

The cultural shift is that deploys feel boring again. Before this work, every quarter had a low-grade anxiety about which credential was due, which repo had been forgotten, who had pasted what into a shared doc. After, the anxiety is gone. The pipeline either works or it doesn't, and when it doesn't, the cause is in code or config, not in a separately-managed credential vault.

For a small team, that's a quality-of-life improvement. For a larger team it's the difference between a security review you can pass and one you can't. Auditors are not interested in your rotation discipline; they want a system whose boundary doesn't depend on rotation discipline. Federation gives them that, and you, in exchange for the half-day of Bicep and workflow setup.

I have not voluntarily shipped a long-lived service principal secret into a CI pipeline since the night I got paged at 4am. I do not intend to.

\t' read -r n s i; do
echo "| $n | \`$s\` | $i |" done echo echo "## Role assignments" echo echo "| Scope | Role |" echo "| --- | --- |" az role assignment list --assignee "$APP_ID" --all \ --query '[].{scope:scope, role:roleDefinitionName}' -o tsv \ | while IFS=

@@SHIKI_BLOCK_15@@

That report is what you hand the auditor. Three sections: application, federated credentials, role assignments. Plus a fourth that proves there are no long-lived secrets. Sixty seconds to generate, ten seconds to review.

In SOC 2 and ISO 27001 audits I've sat through, the "show me how production deploy credentials are managed" question used to involve pulling up a Key Vault, narrating the rotation policy, showing the most recent rotation log, defending the rotation cadence, and answering "who has access to read the secrets". With this report, the answer is "there are no secrets; the federation contract is in this Bicep file; here's the report". Twenty minutes of audit conversation drops to two.

Step 8: Verify each path works

Open a PR with a Bicep change:

@@SHIKI_BLOCK_16@@

Watch:

@@SHIKI_BLOCK_17@@

The job uses the pull_request federated credential, which maps to Reader at subscription scope. It can run what-if; if you tried to deploy from this token you'd get a 403.

Merge to main:

@@SHIKI_BLOCK_18@@

The deploy job uses environment:staging then environment:production. The production deploy will pause for the wait timer plus reviewer approval. Try it without the approval and you'll see the workflow stuck on "Waiting for review."

Step 9: What happens if someone tries to break the model

A common audit question: "what if a developer modifies the workflow to skip approval?" The answer:

@@SHIKI_BLOCK_19@@

When that PR is merged, the deploy job runs without the production environment context. The token's sub is repo:OWNER/REPO:ref:refs/heads/main, which doesn't match any of the three federated credentials. azure/login@v2 returns:

@@SHIKI_BLOCK_20@@

The deploy fails closed. The malicious PR can no longer use the credential. The federated credentials are the security boundary, not the workflow file. This is the property that makes federation worth more than the SP-secret model: the boundary is enforced by Entra ID, not by the workflow being correctly written.

Production checklist

  1. Pin actions to commit SHAs, not tags. azure/login@v2 becomes azure/login@<sha>. Prevents a supply-chain compromise of a maintained action from gaining your tokens.
  2. Restrict environment reviewers to a small group. The production environment's reviewer list is the human side of the security model.
  3. Add a prevent_self_review rule on the production environment (GitHub Enterprise). The person who opens the PR can't be the one who approves its production deploy.
  4. Rotate federated credentials annually. Even though they're not secrets, a yearly review forces someone to look at the subject claims and confirm they're still right.
  5. Run the audit-evidence script in CI. Schedule it weekly, store the output in your evidence repo. Auditors love a paper trail with timestamps.

Troubleshooting

AADSTS70021: No matching federated identity record, Token's sub claim doesn't match any federated credential. Run the show-claims step, compare to the credentials list (az ad app federated-credential list --id $APP_ID).

AADSTS50020: User account does not exist in tenant, You're using a wrong-tenant token. The PR is targeting a workflow with a tenant ID that doesn't match where the app lives. Confirm vars.AZURE_TENANT_ID is the right tenant.

Insufficient privileges to complete the operation, The federated identity authenticated, but the role assignment is wrong. Run az role assignment list --assignee $APP_ID --all to see what the SP actually has.

Environment 'production' is not configured for protection rules, The environment exists but has no reviewers. Add at least one in repo settings or via API.

The given key 'production' was not present in the dictionary (PowerShell), The job didn't have environment: production, but the workflow expected it. Add the environment keyword to the job.

What this gives you

You have a deploy pipeline whose security boundary is enforced by Entra ID and Azure RBAC, not by your workflow file being correct. Three federated credentials, three role scopes, two GitHub environments with protection rules. Zero long-lived secrets, zero rotation calendar, zero late-night pages from credential expiry.

The cultural shift is that deploys feel boring again. Before this work, every quarter had a low-grade anxiety about which credential was due, which repo had been forgotten, who had pasted what into a shared doc. After, the anxiety is gone. The pipeline either works or it doesn't, and when it doesn't, the cause is in code or config, not in a separately-managed credential vault.

For a small team, that's a quality-of-life improvement. For a larger team it's the difference between a security review you can pass and one you can't. Auditors are not interested in your rotation discipline; they want a system whose boundary doesn't depend on rotation discipline. Federation gives them that, and you, in exchange for the half-day of Bicep and workflow setup.

I have not voluntarily shipped a long-lived service principal secret into a CI pipeline since the night I got paged at 4am. I do not intend to.

\t' read -r scope role; do
echo "| \`$scope\` | $role |" done echo echo "## GitHub environment protection" echo gh api "repos/$REPO/environments" --jq '.environments[] | "- **" + .name + "**: " + ((.protection_rules // [] | length | tostring) + " protection rules")'

@@SHIKI_BLOCK_15@@

That report is what you hand the auditor. Three sections: application, federated credentials, role assignments. Plus a fourth that proves there are no long-lived secrets. Sixty seconds to generate, ten seconds to review.

In SOC 2 and ISO 27001 audits I've sat through, the "show me how production deploy credentials are managed" question used to involve pulling up a Key Vault, narrating the rotation policy, showing the most recent rotation log, defending the rotation cadence, and answering "who has access to read the secrets". With this report, the answer is "there are no secrets; the federation contract is in this Bicep file; here's the report". Twenty minutes of audit conversation drops to two.

Step 8: Verify each path works

Open a PR with a Bicep change:

@@SHIKI_BLOCK_16@@

Watch:

@@SHIKI_BLOCK_17@@

The job uses the pull_request federated credential, which maps to Reader at subscription scope. It can run what-if; if you tried to deploy from this token you'd get a 403.

Merge to main:

@@SHIKI_BLOCK_18@@

The deploy job uses environment:staging then environment:production. The production deploy will pause for the wait timer plus reviewer approval. Try it without the approval and you'll see the workflow stuck on "Waiting for review."

Step 9: What happens if someone tries to break the model

A common audit question: "what if a developer modifies the workflow to skip approval?" The answer:

@@SHIKI_BLOCK_19@@

When that PR is merged, the deploy job runs without the production environment context. The token's sub is repo:OWNER/REPO:ref:refs/heads/main, which doesn't match any of the three federated credentials. azure/login@v2 returns:

@@SHIKI_BLOCK_20@@

The deploy fails closed. The malicious PR can no longer use the credential. The federated credentials are the security boundary, not the workflow file. This is the property that makes federation worth more than the SP-secret model: the boundary is enforced by Entra ID, not by the workflow being correctly written.

Production checklist

  1. Pin actions to commit SHAs, not tags. azure/login@v2 becomes azure/login@<sha>. Prevents a supply-chain compromise of a maintained action from gaining your tokens.
  2. Restrict environment reviewers to a small group. The production environment's reviewer list is the human side of the security model.
  3. Add a prevent_self_review rule on the production environment (GitHub Enterprise). The person who opens the PR can't be the one who approves its production deploy.
  4. Rotate federated credentials annually. Even though they're not secrets, a yearly review forces someone to look at the subject claims and confirm they're still right.
  5. Run the audit-evidence script in CI. Schedule it weekly, store the output in your evidence repo. Auditors love a paper trail with timestamps.

Troubleshooting

AADSTS70021: No matching federated identity record, Token's sub claim doesn't match any federated credential. Run the show-claims step, compare to the credentials list (az ad app federated-credential list --id $APP_ID).

AADSTS50020: User account does not exist in tenant, You're using a wrong-tenant token. The PR is targeting a workflow with a tenant ID that doesn't match where the app lives. Confirm vars.AZURE_TENANT_ID is the right tenant.

Insufficient privileges to complete the operation, The federated identity authenticated, but the role assignment is wrong. Run az role assignment list --assignee $APP_ID --all to see what the SP actually has.

Environment 'production' is not configured for protection rules, The environment exists but has no reviewers. Add at least one in repo settings or via API.

The given key 'production' was not present in the dictionary (PowerShell), The job didn't have environment: production, but the workflow expected it. Add the environment keyword to the job.

What this gives you

You have a deploy pipeline whose security boundary is enforced by Entra ID and Azure RBAC, not by your workflow file being correct. Three federated credentials, three role scopes, two GitHub environments with protection rules. Zero long-lived secrets, zero rotation calendar, zero late-night pages from credential expiry.

The cultural shift is that deploys feel boring again. Before this work, every quarter had a low-grade anxiety about which credential was due, which repo had been forgotten, who had pasted what into a shared doc. After, the anxiety is gone. The pipeline either works or it doesn't, and when it doesn't, the cause is in code or config, not in a separately-managed credential vault.

For a small team, that's a quality-of-life improvement. For a larger team it's the difference between a security review you can pass and one you can't. Auditors are not interested in your rotation discipline; they want a system whose boundary doesn't depend on rotation discipline. Federation gives them that, and you, in exchange for the half-day of Bicep and workflow setup.

I have not voluntarily shipped a long-lived service principal secret into a CI pipeline since the night I got paged at 4am. I do not intend to.

GitHub ActionsOIDCMulti-Environment

Conversation

Reactions & comments

Liked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.

Loading conversation…

More from DevOps

See all →