We have 47 Azure Policy assignments across the platform. They were managed by hand for two years, a screen of click-through configuration that only the platform lead understood, mostly auditmode, with one Deny assignment that nobody trusted enough to enforce.
We rebuilt the system as Policy as Code in a small monorepo with a pipeline that lints, tests, deploys, and detects drift. Total time: a quarter. Total assignments still managed in the portal: zero. This is the toolchain.
Repo layout
policy/
definitions/
require-tag-environment.json ← the policy itself
require-tag-environment.test.json ← test cases
no-public-storage.json
no-public-storage.test.json
initiatives/
platform-baseline.json ← grouping of definitions
assignments/
sub-prod.bicep ← Bicep-driven assignment
sub-nonprod.bicep
exemptions/
sub-prod-2026-q2.bicep ← scoped, time-boxed exemptions
.github/workflows/
policy-pr.yml
policy-deploy.yml
policy-drift.yml
Three principles:
- Definitions live as JSON because that's their native format. Bicep wraps them when assigned, but the rules themselves stay in Microsoft's published shape.
- Assignments live in Bicep because they're scope + parameters + identity, which is Bicep's wheelhouse.
- Exemptions are explicit, named, and time-boxed. No "permanent" exemptions in version control, they get a JIRA link and a date.
A definition + its tests
// policy/definitions/require-tag-environment.json
{
"properties": {
"displayName": "Require 'environment' tag on resource groups",
"policyType": "Custom",
"mode": "All",
"parameters": {
"allowedValues": {
"type": "Array",
"metadata": { "displayName": "Allowed environment values" },
"defaultValue": ["dev", "test", "stage", "prod"]
}
},
"policyRule": {
"if": {
"allOf": [
{ "field": "type", "equals": "Microsoft.Resources/subscriptions/resourceGroups" },
{
"anyOf": [
{ "field": "tags['environment']", "exists": "false" },
{
"not": {
"field": "tags['environment']",
"in": "[parameters('allowedValues')]"
}
}
]
}
]
},
"then": { "effect": "deny" }
}
}
}
// policy/definitions/require-tag-environment.test.json
{
"policyDefinition": "require-tag-environment.json",
"cases": [
{
"name": "rg with valid tag passes",
"resource": {
"type": "Microsoft.Resources/subscriptions/resourceGroups",
"name": "rg-payments-prod",
"tags": { "environment": "prod" }
},
"expect": "compliant"
},
{
"name": "rg without environment tag is denied",
"resource": {
"type": "Microsoft.Resources/subscriptions/resourceGroups",
"name": "rg-typo"
},
"expect": "noncompliant"
},
{
"name": "rg with bogus environment value is denied",
"resource": {
"type": "Microsoft.Resources/subscriptions/resourceGroups",
"name": "rg-experiment",
"tags": { "environment": "qa" }
},
"expect": "noncompliant"
}
]
}
The tests are a JSON manifest, not Pester or NUnit. The runner is a small Node script that walks the policy rule against the test resource. ~150 LOC, deliberately limited (handles the policy-language constructs we actually use). It's fast, 47 definitions × 4 cases each runs in 2 seconds.
The PR pipeline
# .github/workflows/policy-pr.yml
name: policy-pr
on:
pull_request:
paths: ["policy/**"]
jobs:
lint-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate JSON shapes
run: |
for f in policy/definitions/*.json policy/initiatives/*.json; do
jq -e 'type=="object" and has("properties")' "$f" >/dev/null \
|| { echo "::error::malformed $f"; exit 1; }
done
- name: Run policy unit tests
run: node policy/test/run.mjs
- name: Bicep what-if for assignments
uses: azure/login@v2
with:
client-id: ${{ vars.AZURE_CLIENT_ID }}
tenant-id: ${{ vars.AZURE_TENANT_ID }}
subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
- run: |
az deployment sub what-if \
--location eastus \
--template-file policy/assignments/sub-nonprod.bicep
Three gates. Lint, unit-test, what-if. PR status checks must all be green to merge. Reviewers see the what-if diff inline.
The drift workflow
This was the unexpectedly important one.
# .github/workflows/policy-drift.yml
name: policy-drift
on:
schedule:
- cron: "17 6 * * *" # daily 06:17 UTC
workflow_dispatch:
jobs:
drift:
runs-on: ubuntu-latest
permissions: { id-token: write, contents: read, issues: write }
steps:
- uses: actions/checkout@v4
- uses: azure/login@v2
with: { /* OIDC */ }
- name: Capture current state
run: |
az policy assignment list --query "[].{name:name, scope:scope, policyId:policyDefinitionId}" \
-o json > current.json
- name: Compare to repo
run: node policy/test/diff.mjs
- name: File issue on drift
if: failure()
uses: actions/github-script@v7
with:
script: |
const body = require('fs').readFileSync('drift-report.md', 'utf8');
await github.rest.issues.create({
...context.repo,
title: `Policy drift detected ${new Date().toISOString().slice(0,10)}`,
body,
labels: ['policy', 'drift'],
});
The drift detector caught two real changes in the last quarter, both audit-mode initiatives that someone had toggled to disabled in the portal during an investigation and forgotten to re-enable. One had been off for three weeks before the workflow caught it.
Why audit-mode isn't free
The conventional wisdom is "audit-only is harmless, deploy first then enforce later." Two ways that bites:
1. Audit mode produces compliance noise. A policy that finds ~10K non-compliant resources floods the compliance dashboard. The signal-to-noise ratio for the policies that matter (the few real Deny ones) drops to zero. Audit mode without a remediation story is technical debt.
2. Some policies have a side-effect even in audit. Microsoft.Authorization/policyAssignments with effect: AuditIfNotExists deploys a managed identity to evaluate compliance. If your subscription budget alerting watches identity counts, audit-mode assignments will surprise you.
The pattern that actually works: audit for a fixed window (1-2 sprints), with an explicit remediation track, then graduate to Deny, or remove the policy. Policies that have been audit-mode for a year get a quarterly review and either get enforced or deleted.
What broke first
Bicep existingAssignment references stopped working when scope strings drifted. A subscription rename in the portal changed the scope ID slightly, and our Bicep assignment couldn't find its prior version on next deploy. Result: a duplicate assignment created, then half the policies evaluated twice. Pin scope by ID, not by name, and use subscription().id as the canonical reference everywhere.
The test harness disagreed with Azure on field('tags') semantics. Azure resolves missing tags to a JSON null at evaluation time; our harness initially returned undefined. A test passed locally and the policy denied resources in production. Now the harness simulates Azure's evaluation explicitly, null where Azure would null, missing where Azure would short-circuit.
Initiative parameters defaulted differently in Bicep vs the portal. Assigning an initiative with policyDefinitions[].parameters that referenced an outer initiative parameter, the Bicep deploy emitted them as JSON nulls, the portal preserved their defaults. The fix is being explicit, always set the parameter values in the assignment Bicep, never rely on initiative-level defaults.
What I'd do differently
I'd start with two assignments, one Deny-only initiative containing the rules you'd be willing to enforce on day one, one Audit-only initiative for the rest. Two scopes (sub-prod, sub-nonprod) × two initiatives = four assignments to track. Easy to reason about. The temptation to one-mega-initiative-with-many-parameters is real and worth resisting.
I would NOT skip the test harness. The 150 LOC paid back the third time it caught a logic bug in a policy rule before it shipped to a subscription. Without it, every policy change is a "deploy and hope the audit mode catches it" gamble, and audit mode runs on a 24-hour evaluation cycle.

Conversation
Reactions & commentsLiked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.