Azure AI
Azure OpenAI, Cognitive Services, vector search, RAG.
Killing API Keys in Azure OpenAI: A Managed Identity + Entra ID Migration Done Properly
Six months ago we had 14 services calling Azure OpenAI. All of them used API keys. The keys were stored in Azure Key Vault, fetched at startup, and rotated quarterly via a manual process that "everyone knew was fragile."
Per-Tenant Cost Attribution for Azure OpenAI Traffic Using APIM emit-token-metric-policy
Six months ago I couldn't tell you which tenant of our SaaS application was costing us the most in Azure OpenAI tokens. The number on the bill was real but unattributable.
Fine-Tuning a Llama Variant With KAITO on AKS, Then Stress-Testing the Inference Endpoint
KAITO (Kubernetes AI Toolchain Operator) on AKS is the smoothest path I've found for "I want to fine-tune a Llama variant on my domain data and serve it as an inference endpoint." The whole pipeline — node provisioning, training, …
vLLM-on-AKS vs Azure OpenAI: Where the Cost Crossover Actually Sits at 1M Tokens/Day
"Self-host the model and save money" is one of those statements that's true at scale and false below it. The interesting question isn't whether self-hosting is cheaper — it's where the crossover point is for your specific workload.
Cost-Per-Query: Azure Agentic Retrieval in Foundry vs Hand-Rolled RAG at 100K Queries/Month
We benchmarked Azure AI Foundry's Agentic Retrieval against our hand-rolled RAG pipeline on the same workload, the same corpus, and the same evaluation set.
The Real Security Checklist for Enterprise RAG on Azure
Most enterprise RAG security writing is one of two things: a marketing-shaped overview ("Azure has Entra ID and Private Endpoints") or a step-by-step that gets the easy stuff right and ignores the hard stuff.
Building a Groundedness Eval Harness Around Azure AI Search Retrieval Agent
A RAG application that doesn't have an evaluation harness isn't a production system — it's a demo with extra steps. You can't tune what you can't measure, and "the model said something reasonable" is not a measurement.
Chunking Strategies on Azure AI Search RAG: What Actually Moved Groundedness Scores in Our Pipeline
We ran four chunking strategies against the same 12,000-document corpus, scored each against the same 200-question evaluation set, and watched groundedness scores move from 0.41 to 0.78 by changing nothing but how the documents we…