Skip to content
A
Topic

Azure AI

Azure OpenAI, Cognitive Services, vector search, RAG.

08 articlesBack to archive
No. 22Azure AI

Killing API Keys in Azure OpenAI: A Managed Identity + Entra ID Migration Done Properly

Six months ago we had 14 services calling Azure OpenAI. All of them used API keys. The keys were stored in Azure Key Vault, fetched at startup, and rotated quarterly via a manual process that "everyone knew was fragile."

Nov 29, 20259 min
No. 19Azure AI

Per-Tenant Cost Attribution for Azure OpenAI Traffic Using APIM emit-token-metric-policy

Six months ago I couldn't tell you which tenant of our SaaS application was costing us the most in Azure OpenAI tokens. The number on the bill was real but unattributable.

Nov 8, 20259 min
No. 17Azure AI

Fine-Tuning a Llama Variant With KAITO on AKS, Then Stress-Testing the Inference Endpoint

KAITO (Kubernetes AI Toolchain Operator) on AKS is the smoothest path I've found for "I want to fine-tune a Llama variant on my domain data and serve it as an inference endpoint." The whole pipeline — node provisioning, training, …

Oct 25, 202510 min
No. 16Azure AI

vLLM-on-AKS vs Azure OpenAI: Where the Cost Crossover Actually Sits at 1M Tokens/Day

"Self-host the model and save money" is one of those statements that's true at scale and false below it. The interesting question isn't whether self-hosting is cheaper — it's where the crossover point is for your specific workload.

Oct 18, 202511 min
No. 14Azure AI

Cost-Per-Query: Azure Agentic Retrieval in Foundry vs Hand-Rolled RAG at 100K Queries/Month

We benchmarked Azure AI Foundry's Agentic Retrieval against our hand-rolled RAG pipeline on the same workload, the same corpus, and the same evaluation set.

Oct 4, 202510 min
No. 13Azure AI

The Real Security Checklist for Enterprise RAG on Azure

Most enterprise RAG security writing is one of two things: a marketing-shaped overview ("Azure has Entra ID and Private Endpoints") or a step-by-step that gets the easy stuff right and ignores the hard stuff.

Sep 27, 202512 min
No. 12Azure AI

Building a Groundedness Eval Harness Around Azure AI Search Retrieval Agent

A RAG application that doesn't have an evaluation harness isn't a production system — it's a demo with extra steps. You can't tune what you can't measure, and "the model said something reasonable" is not a measurement.

Sep 20, 20259 min
No. 11Azure AI

Chunking Strategies on Azure AI Search RAG: What Actually Moved Groundedness Scores in Our Pipeline

We ran four chunking strategies against the same 12,000-document corpus, scored each against the same 200-question evaluation set, and watched groundedness scores move from 0.41 to 0.78 by changing nothing but how the documents we…

Sep 13, 202511 min