Building a Free Bicep-Aware PR Reviewer With GitHub Actions and Azure OpenAI

We had a tool gap. Our application code got AI review on every PR. Our infrastructure code — Bicep templates, Terraform modules, Helm charts — went through whatever the human on rotation was willing to look at, which was usually "the diff scrolled past quickly enough."

Three production incidents traced back to IaC errors that should have been caught at review time convinced me to fix the gap. The result is a 200-line GitHub Actions workflow that runs on every PR touching infra/, costs about $12/month, and has caught 11 issues that would have shipped otherwise.

Here's how it's built and what makes it Bicep-aware specifically.

Why generic IaC review isn't enough

A generic "review this Terraform" prompt produces generic feedback. "Consider adding tags." "This resource group could be parameterized." Worthless.

What we needed was reviewer that knows:

Our naming convention. rg-{env}-{region}-{workload} — anything else gets flagged.
Our security posture. Storage accounts must have allowBlobPublicAccess: false. Key Vaults must have enablePurgeProtection: true. SQL servers must use Entra-ID-only auth.
Our cost guardrails. Anything provisioning a Premium SKU needs a justification comment.
Our deployment topology. New resources go through our hub-and-spoke VNet. Anyone trying to give a resource a public IP should be told no.

A generic LLM doesn't know any of that. The trick is feeding it the rules.

The architecture

.github/workflows/iac-review.yml
  └── triggers on PRs touching infra/**
  └── extracts the diff
  └── calls scripts/review_iac.py
        └── loads internal-rules.md (our codified rules)
        └── classifies the diff: Bicep / Terraform / Helm
        └── builds a type-specific prompt
        └── calls Azure OpenAI gpt-4o
        └── posts the response as a PR comment

The interesting bit is internal-rules.md. It lives in .github/iac-rules/ and reads like a junior engineer's onboarding cheat sheet:

## Naming
All resource group names: rg-{env}-{region}-{workload}. Reject anything else.
Storage account names: st{env}{region}{workload}{nnn} — lowercase, no hyphens.

## Security (BLOCKING — fail the PR)
- Storage: allowBlobPublicAccess MUST be false
- Storage: minimumTlsVersion MUST be TLS1_2
- Key Vault: enablePurgeProtection MUST be true
- Key Vault: enableSoftDelete MUST be true (default since 2020 but be explicit)
- SQL Server: minimalTlsVersion MUST be 1.2
- SQL Server: administrators.administratorType MUST be ActiveDirectory
  (no SQL auth)
- App Service: httpsOnly MUST be true
- VNet: any resource with publicIP MUST also have a comment explaining why

## Cost (WARNING — comment but don't block)
- Anything Premium SKU
- Anything in a region we don't already use
- Anything with sku.tier == 'PremiumV3'

## Style (ADVISORY — only mention if there are also blocking issues)
- Tags: every resource MUST have tags.environment, tags.workload, tags.owner

The script prepends this file to the prompt as system context. The model now reviews against OUR rules, not the LLM's training-data average of "what good Bicep looks like."

The prompt

SYSTEM = f"""You are an IaC reviewer for [YOUR-COMPANY]'s Azure infrastructure.
Apply ONLY the rules in the rule file below. Do NOT add suggestions
that aren't in the rules.

Output format:
- BLOCKING: <issue> at <file>:<line>  — for security violations
- WARNING: <issue> at <file>:<line>   — for cost concerns
- ADVISORY: <issue> at <file>:<line>  — for style (only if blockers exist)

If no issues, output exactly: NO_ISSUES_FOUND.

Rules:
{rules_md}

Code language: {language}  (Bicep | Terraform | Helm)
"""

USER = f"Diff:\n{diff}"

The constraint "do NOT add suggestions that aren't in the rules" is doing heavy lifting. Without it, the model wanders into stylistic feedback nobody asked for.

What it actually catches

Eleven real issues caught in three months. Representative samples:

A Bicep template adding a new storage account without allowBlobPublicAccess: false. Bot blocked the PR. Engineer pushed a fix in the same hour.
A Terraform module setting up a SQL Server with administratorLogin and administratorLoginPassword (SQL auth) — bot blocked, engineer switched to Entra-ID-only auth.
Three separate PRs that added resources without the required tags.owner. Advisory comments. Engineers added them.
A Bicep file deploying an App Service Plan with sku.tier: PremiumV3 without justification. Warning. Engineer added a comment explaining the workload needed it.
A Helm values change that exposed a service via LoadBalancer (which on AKS = public IP) instead of going through our ingress. Bot caught the missing comment-justification — engineer reverted to ClusterIP.

The one that mattered most: a copy-paste error that left a Key Vault without enablePurgeProtection. That would have been a 90-day window during which a deletion couldn't be recovered. Bot blocked it. Cost of the bot for the year: about $144. Cost of the worst-case "we lost a Key Vault" incident: incalculable.

What it gets wrong

About 1 in 8 PRs gets a false-positive blocking comment. Usually it's the bot misreading a refactor — moving a resource from one module to another, the bot interprets the "removed" half as "deleting a resource" and panics.

We added an override mechanism: a PR comment that just says /iac-review-override skips the bot's blocking status (the bot still posts its findings, but they don't gate the merge). Audit-logged. Used about once a month. Hasn't been abused.

The cost

Average diff for an IaC PR is small. About 2K-5K tokens. At gpt-4o pricing that's $0.02-$0.05 per review. We do roughly 20 IaC PRs a week. Annual cost projects to ~$50-$130 depending on how busy the infra team is.

The Azure OpenAI deployment we use for this is shared with the application-PR reviewer (article #1 in this series). Total monthly token spend across both: ~$45.

What I'd add next

A drift-detection mode that runs nightly: compare the live state of subscription against the IaC, prompt the model to identify drift that would matter (someone clicked something in the portal). Not for blocking PRs, just for awareness.

I would NOT extend the bot to auto-apply suggested fixes. We tried that briefly with stylistic suggestions ("add the missing tags") and it created PRs that nobody owned. Suggestions land better when a human engineer pastes them in.

The portable lesson

Putting your team's specific rules into the prompt is what separates "AI review" from "useful AI review." Generic models give generic feedback. The work isn't building the bot — it's writing down the rules you've been carrying in your head.

That second part is harder than it sounds. It took me four weekends to write a draft of internal-rules.md that the team agreed with. Most of those weekends were arguing about edge cases. Worth it.

BicepGitHub ActionsPR Review

Building a Free Bicep-Aware PR Reviewer With GitHub Actions and Azure OpenAI

Why generic IaC review isn't enough

The architecture

The prompt

What it actually catches

What it gets wrong

The cost

What I'd add next

The portable lesson

Conversation

More from DevOps

Five Gotchas When Wiring Azure DevOps MCP Server Into VS Code Copilot

Time-Slicing vs MIG for Bursty LLM Inference Traffic on AKS GPU Node Pools

Swapping ACR for Harbor in an AKS GitOps Pipeline: What Broke, What Didn't