The complete field journal.
Every dispatch in one place. Filter by topic or search by keyword — older entries have aged surprisingly well.
Air-Gapped Azure OpenAI With Private Endpoints: A Terraform Module That Actually Works
"Air-gapped" is a strong word for something running in a public cloud, but it's the right word for what regulated customers want: an Azure OpenAI deployment whose only network path is through their own VNet, with public access ful…
Killing API Keys in Azure OpenAI: A Managed Identity + Entra ID Migration Done Properly
Six months ago we had 14 services calling Azure OpenAI. All of them used API keys. The keys were stored in Azure Key Vault, fetched at startup, and rotated quarterly via a manual process that "everyone knew was fragile."
Five Gotchas When Wiring Azure DevOps MCP Server Into VS Code Copilot
The Azure DevOps MCP Server's setup docs make it look like a five-minute task. It is, if everything goes right. Most teams hit one or more of these five issues, lose an afternoon, and conclude the tool is "buggy" when really it's …
Edge RAG on Azure Arc From a Real Site Deployment: Latency, Hardware, Cost
For most workloads, "RAG in the cloud" is the right shape. For some workloads — regulated industries, manufacturing floors, retail stores, remote sites with weak connectivity — the data legally or practically can't leave the site.…
Per-Tenant Cost Attribution for Azure OpenAI Traffic Using APIM emit-token-metric-policy
Six months ago I couldn't tell you which tenant of our SaaS application was costing us the most in Azure OpenAI tokens. The number on the bill was real but unattributable.
Time-Slicing vs MIG for Bursty LLM Inference Traffic on AKS GPU Node Pools
NVIDIA gives you two ways to share a single GPU across multiple workloads on Kubernetes: time-slicing and MIG (Multi-Instance GPU). The first is software-based and flexible. The second is hardware-partitioned and rigid.
Fine-Tuning a Llama Variant With KAITO on AKS, Then Stress-Testing the Inference Endpoint
KAITO (Kubernetes AI Toolchain Operator) on AKS is the smoothest path I've found for "I want to fine-tune a Llama variant on my domain data and serve it as an inference endpoint." The whole pipeline — node provisioning, training, …
vLLM-on-AKS vs Azure OpenAI: Where the Cost Crossover Actually Sits at 1M Tokens/Day
"Self-host the model and save money" is one of those statements that's true at scale and false below it. The interesting question isn't whether self-hosting is cheaper — it's where the crossover point is for your specific workload.
Deploying One AI Agent in Production With Azure AI Foundry: Three Things I Wish I'd Known First
The "build a multi-agent system" tutorials are fun. Building one agent and putting it in production where customers actually depend on it is a different sport.
Cost-Per-Query: Azure Agentic Retrieval in Foundry vs Hand-Rolled RAG at 100K Queries/Month
We benchmarked Azure AI Foundry's Agentic Retrieval against our hand-rolled RAG pipeline on the same workload, the same corpus, and the same evaluation set.
The Real Security Checklist for Enterprise RAG on Azure
Most enterprise RAG security writing is one of two things: a marketing-shaped overview ("Azure has Entra ID and Private Endpoints") or a step-by-step that gets the easy stuff right and ignores the hard stuff.
Building a Groundedness Eval Harness Around Azure AI Search Retrieval Agent
A RAG application that doesn't have an evaluation harness isn't a production system — it's a demo with extra steps. You can't tune what you can't measure, and "the model said something reasonable" is not a measurement.
Chunking Strategies on Azure AI Search RAG: What Actually Moved Groundedness Scores in Our Pipeline
We ran four chunking strategies against the same 12,000-document corpus, scored each against the same 200-question evaluation set, and watched groundedness scores move from 0.41 to 0.78 by changing nothing but how the documents we…
Multi-Region AKS-Only GitOps With Azure Arc: A Drift-Reconciliation War Story
We run AKS in three regions: West Europe, East US, and Australia East. The promise of GitOps with Azure Arc is "one Git repo, three clusters, drift gets reconciled automatically." The reality is more interesting and considerably m…
Swapping ACR for Harbor in an AKS GitOps Pipeline: What Broke, What Didn't
Azure Container Registry (ACR) is the default registry for AKS workloads, and for most teams it's the right call — managed, integrated with Entra ID, geo-replicated.
Backstage on AKS With CAPZ + ASO Instead of Crossplane: When the Tooling Choice Matters
Most Backstage-on-AKS internal-platform tutorials reach for Crossplane to do the resource provisioning. We started there too.
Day 1 vs Day 90 on an AKS Internal Platform: What I'd Wire Differently
Three months ago I stood up a new internal developer platform on AKS for a 30-engineer team. Backstage as the portal, ArgoCD for delivery, Crossplane for resource provisioning, the usual stack.
Migrating an AKS Cluster Off Flux v2 to the New ArgoCD Extension Without Dropping Reconciliation
When the ArgoCD extension for AKS hit GA at KubeCon Europe 2026, we had four production AKS clusters running Flux v2 GitOps and a long-standing internal preference for ArgoCD's UI for application-team developers.
Letting Copilot Agent Mode Own Our Monthly AKS Maintenance Run: Five Failure Modes I Hit
Once a month I do the same boring AKS chore: rotate certificates, prune unused resources, check node pool versions against the support matrix, and update the Helm releases for our common platform services.
Building a Free Bicep-Aware PR Reviewer With GitHub Actions and Azure OpenAI
We had a tool gap. Our application code got AI review on every PR. Our infrastructure code — Bicep templates, Terraform modules, Helm charts — went through whatever the human on rotation was willing to look at, which was usually "…
What an SRE Agent Caught Last Quarter (and What It Missed)
The Azure SRE Agent has been running against our production AKS cluster for one quarter. Three months. About 90 incidents.
30 Days With the Azure DevOps MCP Server: What Actually Changed in My Backlog Triage
I track tickets like most people: poorly. The backlog has 240 open work items in it, the average age is 71 days, and roughly a third are duplicates of each other under slightly different wording.
Plugging Azure OpenAI Into Azure Pipelines for PR Review: A Real-World Setup
The first time we tried this, the bot left a comment on every PR that just said "Looks good!" — including on a PR that introduced a hard-coded SAS token.