We ran a daily FinOps email for ten months. It had cost-by-subscription, cost-by-tag, the top-five-movers list, everything finance asked for. Eight people opened it. Two of them were the FinOps team.
Last quarter I deleted the email and replaced it with a small Model Context Protocol server that any engineer can call from their editor. Usage went from "ignored daily" to "queried 80+ times a week" inside three weeks. This is the implementation, the gotchas, and what I'd cut if I started again.
What an MCP server actually is
A process that speaks a small JSON-RPC protocol over stdin/stdout (or SSE), exposing Tools (functions the model can call), Resources (named data the model can read), and Prompts (templated instructions). Copilot, Claude Desktop, Cursor, and the Azure DevOps MCP Server all speak the same protocol, write one server, it works across clients.
For cost insights, only Tools matter. Resources are nice but engineers want to ask questions, not browse trees.
The TypeScript skeleton
// src/server.ts
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
CallToolRequestSchema,
ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";
import { DefaultAzureCredential } from "@azure/identity";
const credential = new DefaultAzureCredential();
const server = new Server(
{ name: "azure-cost-insights", version: "0.2.0" },
{ capabilities: { tools: {} } }
);
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: "cost_by_service",
description:
"Returns Azure spend grouped by service for a date window. " +
"Use for questions like 'what did we spend on AKS last week'.",
inputSchema: {
type: "object",
required: ["subscriptionId", "from", "to"],
properties: {
subscriptionId: { type: "string" },
from: { type: "string", description: "ISO 8601 date" },
to: { type: "string", description: "ISO 8601 date" },
tagFilter: { type: "string", description: "key=value, optional" },
},
},
},
{
name: "cost_top_movers",
description: "Top N services where cost changed most week-over-week.",
inputSchema: {
type: "object",
required: ["subscriptionId"],
properties: {
subscriptionId: { type: "string" },
top: { type: "number", default: 10 },
},
},
},
],
}));
The descriptions matter more than the names. Models pick tools from the description, not the function signature. Spend more time on those than on naming.
The Cost Management call
async function costByService(args: CostByServiceArgs) {
const token = await credential.getToken("https://management.azure.com/.default");
const url =
`https://management.azure.com/subscriptions/${args.subscriptionId}` +
`/providers/Microsoft.CostManagement/query?api-version=2024-08-01`;
const body = {
type: "ActualCost",
timeframe: "Custom",
timePeriod: { from: args.from, to: args.to },
dataset: {
granularity: "None",
aggregation: { totalCost: { name: "Cost", function: "Sum" } },
grouping: [{ type: "Dimension", name: "ServiceName" }],
filter: args.tagFilter ? toTagFilter(args.tagFilter) : undefined,
},
};
const res = await fetch(url, {
method: "POST",
headers: {
Authorization: `Bearer ${token.token}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
if (res.status === 429) {
const retry = Number(res.headers.get("retry-after") ?? "30");
throw new RetryableError(`rate limited; retry in ${retry}s`, retry);
}
if (!res.ok) throw new Error(`cost mgmt ${res.status}: ${await res.text()}`);
const data = await res.json();
return data.properties.rows.map((r: any[]) => ({
service: r[1],
cost: Number(r[0]).toFixed(2),
currency: r[2],
}));
}
Two non-obvious bits:
- Token caching is built into
DefaultAzureCredential. Don't roll your own, and don't reach forClientSecretCredential, managed identity in Container Apps + workload identity locally cover both modes. timeframe: "Custom"+timePeriodis the only pair that lets the model pass an arbitrary window. The named timeframes (MonthToDate, etc.) sound nicer in code but produce hallucinated dates when the model paraphrases the question.
Cost Management's rate limit will bite you
The Cost Management query API allows ~5 requests per second per subscription, ~50 per minute. A single editor session asking three follow-up questions can blow past that if you also have a daily report polling. The fix is a per-subscription token bucket and a graceful retry:
const buckets = new Map<string, { tokens: number; lastRefill: number }>();
async function withBudget<T>(subId: string, fn: () => Promise<T>): Promise<T> {
const now = Date.now();
const b = buckets.get(subId) ?? { tokens: 5, lastRefill: now };
// refill 1 token per second up to 5
b.tokens = Math.min(5, b.tokens + (now - b.lastRefill) / 1000);
b.lastRefill = now;
if (b.tokens < 1) {
await new Promise((r) => setTimeout(r, (1 - b.tokens) * 1000));
b.tokens = 0;
} else {
b.tokens -= 1;
}
buckets.set(subId, b);
return fn();
}
Wiring the call handler
server.setRequestHandler(CallToolRequestSchema, async (req) => {
const { name, arguments: args } = req.params;
try {
if (name === "cost_by_service") {
const rows = await withBudget(args.subscriptionId, () => costByService(args));
return { content: [{ type: "text", text: formatTable(rows) }] };
}
if (name === "cost_top_movers") {
const rows = await topMovers(args);
return { content: [{ type: "text", text: formatTable(rows) }] };
}
throw new Error(`unknown tool: ${name}`);
} catch (e) {
return {
isError: true,
content: [{ type: "text", text: e instanceof Error ? e.message : String(e) }],
};
}
});
await server.connect(new StdioServerTransport());
The formatTable helper renders results as a markdown table, Copilot and Claude both render that natively in chat, so the answer feels native instead of "look at this JSON dump."
Wiring it into VS Code
// .vscode/mcp.json
{
"mcpServers": {
"azure-cost-insights": {
"command": "node",
"args": ["${workspaceFolder}/dist/server.js"],
"env": {
"AZURE_SUBSCRIPTION_ID": "${input:subId}"
}
}
},
"inputs": [
{ "id": "subId", "type": "promptString", "description": "Subscription" }
]
}
Restart VS Code, open Copilot, type "what did we spend on Azure OpenAI last week". Copilot picks the tool from the description, calls it, and renders the table inline.
What broke first
The model passed dates without timezones. It would send "2026-04-01" and the API treats that as midnight UTC, which loses six hours of cost data for a US-East team. I now normalise on the server: new Date(args.from + "T00:00:00Z").toISOString() and force-end at the next day's midnight UTC. Tool description was updated to say "dates are interpreted as UTC midnight".
Long answers got truncated by the client. Some clients cap a single tool response at 32K characters. A wide query can blow that. I added a format argument that defaults to summary (top 20 + total) and only returns full rows if the model explicitly asks for detailed. Regressions disappeared.
Auth at the wrong scope. The first version asked for https://management.azure.com/user_impersonation, that worked locally with a developer login but failed on Container Apps with managed identity. The right scope is https://management.azure.com/.default for service-to-service, every time.
What I'd cut
The cost_top_movers tool. Everyone asked the same week-over-week question with totally different phrasings and the model would pick the wrong tool half the time. I rewrote cost_by_service to accept an optional compareWindow parameter and now there's only one tool. Less surface area, fewer wrong picks.
I would NOT add a "summarise this in plain English" tool. The model can summarise. Tools should return structured data; the chat is where natural language lives.

Conversation
Reactions & commentsLiked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.