The complete article archive.
Every post in one place. Filter by topic or search by keyword. Older entries have aged surprisingly well.
Foundry-to-Foundry Agent Communication With the A2A Protocol for Distributed Agent Systems
The customer-support orchestration covered in [the multi-agent article](https://damionas.com/articles/multi-agent-orchestration-in-microsoft-foundry-with-shared-memory-and-handoff-patterns) had four agents inside one Foundry proje…
Continuous Evaluation for Foundry Agents With Prompt Flow and GitHub Actions
The Foundry agent we shipped to a customer-success team passed every hand-tested scenario before launch. Six weeks later, the team's manager pulled me aside: "the agent's getting worse at billing questions, but we can't tell when …
Microsoft Foundry Content Safety With Custom Classifiers and Defence in Depth
The Foundry agent we ran for an internal HR-question workload had Microsoft's default content safety filters enabled and we considered the security story closed.
Hybrid Search and Semantic Ranking in Azure AI Search With Custom Scoring Profiles
The first version of our Foundry-backed RAG pipeline used vector-only search against an Azure AI Search index. Recall was 71% on the gold set; for a customer-support workload, that meant about three in ten questions returned no us…
A Custom Foundry Tool That Queries Azure SQL With Row-Level Security via Entra ID OBO
The first version of our "ask the database" Foundry tool was a function that took a customer ID, ran `SELECT * FROM customer_orders WHERE customer_id = ?` against Azure SQL, and returned the rows.
Production Microsoft Foundry Agent Service With VNet Integration and Private Link
The first version of our Foundry agent service was reachable on the public internet behind an API key. That was fine for the prototype demo.
Multi-Agent Orchestration in Microsoft Foundry With Shared Memory and Handoff Patterns
The first multi-agent system I shipped on Foundry was a four-agent customer-support workflow: a triage agent that classified incoming requests, a knowledge agent that searched the docs, a billing agent that looked up account state…
Migrating Classic Release Pipelines to YAML, the Six-Week Phased Plan
The Azure DevOps organisation I was asked to modernise had eighty-three Classic Release pipelines, the oldest dating to 2017.
Service Connection Vending With Workload Identity Federation, at Org Scale
The platform team I joined had 137 Azure DevOps Service Connections in their main organisation. Of those, 48 used long-lived Service Principal client secrets that had been rotated zero times.
End-to-End Observability for Azure AI Foundry Agents With OpenTelemetry and Application Insights
A production Foundry agent we ran for an internal customer started returning slow responses for one tenant on a Tuesday afternoon.
Self-Hosted Azure DevOps Agents on AKS With KEDA Autoscaling
The platform team I joined had thirty Microsoft-hosted Azure DevOps agent minutes left in the month and it was the eighth.
Multi-Stage Azure Pipelines With Bicep What-If Gates and Canary Promotion
The deploy pipeline I inherited at one customer was a single 400-line YAML that ran `az deployment group create` against production every time someone merged to main. There was no preview, no manual gate, no canary.
Building an Azure Subscription Vending Machine With ARM Template Specs and Azure DevOps
The first time someone at our org needed a new Azure subscription, it took two weeks. A ticket got filed, a senior architect figured out the management-group placement, a Cloud Center of Excellence Owner manually created the subsc…
Air-Gapped Azure OpenAI With Private Endpoints: A Terraform Module That Actually Works
"Air-gapped" is a strong word for something running in a public cloud, but it's the right word for what regulated customers want: an Azure OpenAI deployment whose only network path is through their own VNet, with public access ful…
Build a Bicep-Aware PR Reviewer Bot With GitHub Actions and Azure OpenAI
I review Bicep PRs. A lot of them. The team had grown from four engineers writing infrastructure to fourteen, and the rate of "did you check that this resource has diagnostic settings wired up" comments I was leaving had passed th…
Build and Ship an Azure Cost MCP Server From Empty Folder to Container Apps in 60 Minutes
For ten months our FinOps team published a beautifully formatted daily cost email. Subscription totals, top-five movers, tag breakdowns. It linked to two dashboards. It went to forty-seven engineers.
Streaming Azure OpenAI Through APIM: Token Budgets, Per-Tenant Limits, and Not Breaking SSE
We hit a fourteen-thousand-dollar Azure OpenAI bill in three days because one tenant's misbehaving agent ran an unbounded chain. The bill stopped the moment we put APIM in front of AOAI.
Killing API Keys in Azure OpenAI: A Managed Identity + Entra ID Migration Done Properly
Six months ago we had 14 services calling Azure OpenAI. All of them used API keys. The keys were stored in Azure Key Vault, fetched at startup, and rotated quarterly via a manual process that "everyone knew was fragile."
GitHub Actions → Azure With OIDC Federated Identity: The Setup That Survived Our SOC 2 Audit
I used to rotate Azure service principal secrets in fourteen GitHub repos every quarter. Manually. Because a teammate had been burned by an automated rotation that desynced halfway through and took production down at 3am.
Azure Policy as Code in Pipelines: Testing, Drift Detection, and Why Audit-Mode Isn't Free
We have 47 Azure Policy assignments across the platform. They were managed by hand for two years, a screen of click-through configuration that only the platform lead understood, mostly auditmode, with one Deny assignment that nobo…
Five Gotchas When Wiring Azure DevOps MCP Server Into VS Code Copilot
The Azure DevOps MCP Server's setup docs make it look like a five-minute task. It is, if everything goes right. Most teams hit one or more of these five issues, lose an afternoon, and conclude the tool is "buggy" when really it's …
GitHub Actions Composite Actions vs Reusable Workflows: When To Use Which (And When To Use Neither)
A team asks me this question every month: *should we put this CI logic in a composite action or a reusable workflow?* And every month I give the same five-minute answer that none of the official docs put in one place.
Build the Azure Policy as Code Pipeline: Definitions, Tests, Drift, Exemptions
Two years ago we had 47 Azure Policy assignments across our subscriptions. They were managed by hand, click-through configuration in the portal, mostly audit-mode, with one Deny assignment that nobody trusted enough to actually en…
Edge RAG on Azure Arc From a Real Site Deployment: Latency, Hardware, Cost
For most workloads, "RAG in the cloud" is the right shape. For some workloads, regulated industries, manufacturing floors, retail stores, remote sites with weak connectivity, the data legally or practically can't leave the site.
Karpenter on AKS vs Cluster Autoscaler vs Node Auto-Provisioning: The Workload Where Each Wins
The "what scales nodes on AKS" question used to have one answer: Cluster Autoscaler. Now there are three: Cluster Autoscaler (CA), AKS **Node Auto-Provisioning** (NAP, which is Karpenter underneath), and self-managed **Karpenter o…
Per-Tenant Cost Attribution for Azure OpenAI Traffic Using APIM emit-token-metric-policy
Six months ago I couldn't tell you which tenant of our SaaS application was costing us the most in Azure OpenAI tokens. The number on the bill was real but unattributable.
Migrate a Resource Group Into a Bicep Deployment Stack: Two-Phase, Zero-Downtime
I have, over the past four years of Azure work, deployed a Bicep template against a resource group three separate times and forgotten about resources the *previous* template had created.
Time-Slicing vs MIG for Bursty LLM Inference Traffic on AKS GPU Node Pools
NVIDIA gives you two ways to share a single GPU across multiple workloads on Kubernetes: time-slicing and MIG (Multi-Instance GPU). The first is software-based and flexible. The second is hardware-partitioned and rigid.
Bicep Deployment Stacks: The Cleanup Story I Should Have Shipped Years Ago
I've been deploying Bicep against resource groups for four years. I have, on three separate occasions, deployed a fresh template and then forgotten about the resources the *previous* template created, because Azure's default deplo…
Fine-Tuning a Llama Variant With KAITO on AKS, Then Stress-Testing the Inference Endpoint
KAITO (Kubernetes AI Toolchain Operator) on AKS is the smoothest path I've found for "I want to fine-tune a Llama variant on my domain data and serve it as an inference endpoint." The whole pipeline, node provisioning, training, d…
Build a Production APIM Layer in Front of Azure OpenAI: Token Budgets, Streaming, Per-Tenant Cost
We hit a fourteen-thousand-dollar Azure OpenAI bill in three days because one tenant's misbehaving agent ran an unbounded chain. The bill stopped the moment we put APIM in front of AOAI.
vLLM-on-AKS vs Azure OpenAI: Where the Cost Crossover Actually Sits at 1M Tokens/Day
"Self-host the model and save money" is one of those statements that's true at scale and false below it. The interesting question isn't whether self-hosting is cheaper, it's where the crossover point is for your specific workload.
Wiring GitHub Copilot Agent Mode + MCP Into Our Incident-Response Runbooks
Our P2 incident playbook used to be a 14-step Confluence page. It pointed at four Azure portals, three KQL queries, two PowerShell scripts, and a Slack channel.
Deploying One AI Agent in Production With Azure AI Foundry: Three Things I Wish I'd Known First
The "build a multi-agent system" tutorials are fun. Building one agent and putting it in production where customers actually depend on it is a different sport.
Replace Every Service Principal Secret With OIDC Federation: A Multi-Environment Walkthrough
I once got paged at 4am because a service principal secret expired in the middle of a release. The deploy succeeded for staging, then the production stage tried to authenticate, the SP credential had hit its 30-day TTL three minut…
Cost-Per-Query: Azure Agentic Retrieval in Foundry vs Hand-Rolled RAG at 100K Queries/Month
We benchmarked Azure AI Foundry's Agentic Retrieval against our hand-rolled RAG pipeline on the same workload, the same corpus, and the same evaluation set.
Contract-Testing an MCP Server: Fixtures, Golden Files, and the Harness That Catches Most Regressions
The MCP server we run for cost queries had a regression last quarter that nobody caught for nine days. The Cost Management API changed the shape of the `properties.rows` array (a fourth column appeared), our parser silently mapped…
The Real Security Checklist for Enterprise RAG on Azure
Most enterprise RAG security writing is one of two things: a marketing-shaped overview ("Azure has Entra ID and Private Endpoints") or a step-by-step that gets the easy stuff right and ignores the hard stuff.
Building a Groundedness Eval Harness Around Azure AI Search Retrieval Agent
A RAG application that doesn't have an evaluation harness isn't a production system, it's a demo with extra steps. You can't tune what you can't measure, and "the model said something reasonable" is not a measurement.
An MCP Server That Runs Bicep What-If and Detects Drift, From Inside My Editor
I review Bicep PRs. A lot of them. Half my comments before this tool were variants of "did you run what-if?", because the answer was usually no, and the diff would have caught it.
Chunking Strategies on Azure AI Search RAG: What Actually Moved Groundedness Scores in Our Pipeline
We ran four chunking strategies against the same 12,000-document corpus, scored each against the same 200-question evaluation set, and watched groundedness scores move from 0.41 to 0.78 by changing nothing but how the documents we…
Add Per-User OAuth and On-Behalf-Of to an Internal MCP Server
The day after we widened the audience for our internal MCP server to the broader engineering org, one of the first new users asked it to fetch cost data for a subscription they shouldn't have been able to see.
Multi-Region AKS-Only GitOps With Azure Arc: A Drift-Reconciliation War Story
We run AKS in three regions: West Europe, East US, and Australia East. The promise of GitOps with Azure Arc is "one Git repo, three clusters, drift gets reconciled automatically." The reality is more interesting and considerably m…
Securing an Internal MCP Server Behind Entra ID With Per-Tool OAuth Scopes
The day after we shipped our MCP server to a wider engineering audience, somebody used it to query cost data for a subscription they shouldn't have been able to see. Not maliciously, they had MCP wired up before their RBAC was.
Swapping ACR for Harbor in an AKS GitOps Pipeline: What Broke, What Didn't
Azure Container Registry (ACR) is the default registry for AKS workloads, and for most teams it's the right call, managed, integrated with Entra ID, geo-replicated.
Stand Up a Production-Ready Internal MCP Server on Azure Container Apps With Workload Identity
The first version of our internal MCP server ran on a developer's laptop. It worked beautifully. Then they took a Friday off and the FinOps Slack channel filled with "MCP server unavailable" complaints by 11am, because the laptop …
Backstage on AKS With CAPZ + ASO Instead of Crossplane: When the Tooling Choice Matters
Most Backstage-on-AKS internal-platform tutorials reach for Crossplane to do the resource provisioning. We started there too.
Hosting MCP Servers on Azure Container Apps With Workload Identity (No Keys, No Sidecars)
The first version of our internal MCP server ran on a developer's laptop. It worked beautifully, until they took a Friday off and the FinOps Slack channel filled with "MCP server unavailable" complaints by 11am.
Day 1 vs Day 90 on an AKS Internal Platform: What I'd Wire Differently
Three months ago I stood up a new internal developer platform on AKS for a 30-engineer team. Backstage as the portal, ArgoCD for delivery, Crossplane for resource provisioning, the usual stack.
Migrating an AKS Cluster Off Flux v2 to the New ArgoCD Extension Without Dropping Reconciliation
When the ArgoCD extension for AKS hit GA at KubeCon Europe 2026, we had four production AKS clusters running Flux v2 GitOps and a long-standing internal preference for ArgoCD's UI for application-team developers.
Building a Custom MCP Server for Azure Cost Insights, The 200-LOC Tool That Replaced Our Daily FinOps Email
We ran a daily FinOps email for ten months. It had cost-by-subscription, cost-by-tag, the top-five-movers list, everything finance asked for. Eight people opened it. Two of them were the FinOps team.
Letting Copilot Agent Mode Own Our Monthly AKS Maintenance Run: Five Failure Modes I Hit
Once a month I do the same boring AKS chore: rotate certificates, prune unused resources, check node pool versions against the support matrix, and update the Helm releases for our common platform services.
Building a Free Bicep-Aware PR Reviewer With GitHub Actions and Azure OpenAI
We had a tool gap. Our application code got AI review on every PR. Our infrastructure code, Bicep templates, Terraform modules, Helm charts, went through whatever the human on rotation was willing to look at, which was usually "th…
What an SRE Agent Caught Last Quarter (and What It Missed)
The Azure SRE Agent has been running against our production AKS cluster for one quarter. Three months. About 90 incidents.
30 Days With the Azure DevOps MCP Server: What Actually Changed in My Backlog Triage
I track tickets like most people: poorly. The backlog has 240 open work items in it, the average age is 71 days, and roughly a third are duplicates of each other under slightly different wording.
Plugging Azure OpenAI Into Azure Pipelines for PR Review: A Real-World Setup
The first time we tried this, the bot left a comment on every PR that just said "Looks good!", including on a PR that introduced a hard-coded SAS token.