Skip to content
A
Azure·Bicep·Pipelines·Foundry·Agents·Evaluations·DevOps·AKS·OpenAI·RAG·Landing Zones·Private Endpoints·Cost·FinOps·Identity·Workload Identity·Bicep Modules·GitHub Actions·Runners·ARM·Azure·Bicep·Pipelines·Foundry·Agents·Evaluations·DevOps·AKS·OpenAI·RAG·Landing Zones·Private Endpoints·Cost·FinOps·Identity·Workload Identity·Bicep Modules·GitHub Actions·Runners·ARM·

Featured field reports

All articles →

Recent dispatches

07 latest
No. 23Azure

Air-Gapped Azure OpenAI With Private Endpoints: A Terraform Module That Actually Works

"Air-gapped" is a strong word for something running in a public cloud, but it's the right word for what regulated customers want: an Azure OpenAI deployment whose only network path is through their own VNet, with public access ful…

Dec 5, 202510 min
No. 22Azure AI

Killing API Keys in Azure OpenAI: A Managed Identity + Entra ID Migration Done Properly

Six months ago we had 14 services calling Azure OpenAI. All of them used API keys. The keys were stored in Azure Key Vault, fetched at startup, and rotated quarterly via a manual process that "everyone knew was fragile."

Nov 29, 20259 min
No. 21DevOps

Five Gotchas When Wiring Azure DevOps MCP Server Into VS Code Copilot

The Azure DevOps MCP Server's setup docs make it look like a five-minute task. It is, if everything goes right. Most teams hit one or more of these five issues, lose an afternoon, and conclude the tool is "buggy" when really it's …

Nov 22, 20256 min
No. 20Foundry

Edge RAG on Azure Arc From a Real Site Deployment: Latency, Hardware, Cost

For most workloads, "RAG in the cloud" is the right shape. For some workloads — regulated industries, manufacturing floors, retail stores, remote sites with weak connectivity — the data legally or practically can't leave the site.…

Nov 15, 202511 min
No. 19Azure AI

Per-Tenant Cost Attribution for Azure OpenAI Traffic Using APIM emit-token-metric-policy

Six months ago I couldn't tell you which tenant of our SaaS application was costing us the most in Azure OpenAI tokens. The number on the bill was real but unattributable.

Nov 8, 20259 min
No. 18DevOps

Time-Slicing vs MIG for Bursty LLM Inference Traffic on AKS GPU Node Pools

NVIDIA gives you two ways to share a single GPU across multiple workloads on Kubernetes: time-slicing and MIG (Multi-Instance GPU). The first is software-based and flexible. The second is hardware-partitioned and rigid.

Nov 1, 202510 min
No. 17Azure AI

Fine-Tuning a Llama Variant With KAITO on AKS, Then Stress-Testing the Inference Endpoint

KAITO (Kubernetes AI Toolchain Operator) on AKS is the smoothest path I've found for "I want to fine-tune a Llama variant on my domain data and serve it as an inference endpoint." The whole pipeline — node provisioning, training, …

Oct 25, 202510 min
Stay in the loop

New issues land when the work merits one.

No newsletter spam, no AI-generated filler. Just postmortems, patterns, and the occasional rant about private endpoints.