Building a Custom MCP Server for Azure Cost Insights, The 200-LOC Tool That Replaced Our Daily FinOps Email

We ran a daily FinOps email for ten months. It had cost-by-subscription, cost-by-tag, the top-five-movers list, everything finance asked for. Eight people opened it. Two of them were the FinOps team.

Last quarter I deleted the email and replaced it with a small Model Context Protocol server that any engineer can call from their editor. Usage went from "ignored daily" to "queried 80+ times a week" inside three weeks. This is the implementation, the gotchas, and what I'd cut if I started again.

What an MCP server actually is

A process that speaks a small JSON-RPC protocol over stdin/stdout (or SSE), exposing Tools (functions the model can call), Resources (named data the model can read), and Prompts (templated instructions). Copilot, Claude Desktop, Cursor, and the Azure DevOps MCP Server all speak the same protocol, write one server, it works across clients.

For cost insights, only Tools matter. Resources are nice but engineers want to ask questions, not browse trees.

The TypeScript skeleton

// src/server.ts
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";
import { DefaultAzureCredential } from "@azure/identity";

const credential = new DefaultAzureCredential();

const server = new Server(
  { name: "azure-cost-insights", version: "0.2.0" },
  { capabilities: { tools: {} } }
);

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: "cost_by_service",
      description:
        "Returns Azure spend grouped by service for a date window. " +
        "Use for questions like 'what did we spend on AKS last week'.",
      inputSchema: {
        type: "object",
        required: ["subscriptionId", "from", "to"],
        properties: {
          subscriptionId: { type: "string" },
          from:           { type: "string", description: "ISO 8601 date" },
          to:             { type: "string", description: "ISO 8601 date" },
          tagFilter:      { type: "string", description: "key=value, optional" },
        },
      },
    },
    {
      name: "cost_top_movers",
      description: "Top N services where cost changed most week-over-week.",
      inputSchema: {
        type: "object",
        required: ["subscriptionId"],
        properties: {
          subscriptionId: { type: "string" },
          top:            { type: "number", default: 10 },
        },
      },
    },
  ],
}));

The descriptions matter more than the names. Models pick tools from the description, not the function signature. Spend more time on those than on naming.

The Cost Management call

async function costByService(args: CostByServiceArgs) {
  const token = await credential.getToken("https://management.azure.com/.default");
  const url =
    `https://management.azure.com/subscriptions/${args.subscriptionId}` +
    `/providers/Microsoft.CostManagement/query?api-version=2024-08-01`;

  const body = {
    type: "ActualCost",
    timeframe: "Custom",
    timePeriod: { from: args.from, to: args.to },
    dataset: {
      granularity: "None",
      aggregation: { totalCost: { name: "Cost", function: "Sum" } },
      grouping: [{ type: "Dimension", name: "ServiceName" }],
      filter: args.tagFilter ? toTagFilter(args.tagFilter) : undefined,
    },
  };

  const res = await fetch(url, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${token.token}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify(body),
  });

  if (res.status === 429) {
    const retry = Number(res.headers.get("retry-after") ?? "30");
    throw new RetryableError(`rate limited; retry in ${retry}s`, retry);
  }
  if (!res.ok) throw new Error(`cost mgmt ${res.status}: ${await res.text()}`);

  const data = await res.json();
  return data.properties.rows.map((r: any[]) => ({
    service: r[1],
    cost:    Number(r[0]).toFixed(2),
    currency: r[2],
  }));
}

Two non-obvious bits:

Token caching is built into DefaultAzureCredential. Don't roll your own, and don't reach for ClientSecretCredential, managed identity in Container Apps + workload identity locally cover both modes.
timeframe: "Custom" + timePeriod is the only pair that lets the model pass an arbitrary window. The named timeframes (MonthToDate, etc.) sound nicer in code but produce hallucinated dates when the model paraphrases the question.

Cost Management's rate limit will bite you

The Cost Management query API allows ~5 requests per second per subscription, ~50 per minute. A single editor session asking three follow-up questions can blow past that if you also have a daily report polling. The fix is a per-subscription token bucket and a graceful retry:

const buckets = new Map<string, { tokens: number; lastRefill: number }>();

async function withBudget<T>(subId: string, fn: () => Promise<T>): Promise<T> {
  const now = Date.now();
  const b = buckets.get(subId) ?? { tokens: 5, lastRefill: now };
  // refill 1 token per second up to 5
  b.tokens = Math.min(5, b.tokens + (now - b.lastRefill) / 1000);
  b.lastRefill = now;
  if (b.tokens < 1) {
    await new Promise((r) => setTimeout(r, (1 - b.tokens) * 1000));
    b.tokens = 0;
  } else {
    b.tokens -= 1;
  }
  buckets.set(subId, b);
  return fn();
}

Wiring the call handler

server.setRequestHandler(CallToolRequestSchema, async (req) => {
  const { name, arguments: args } = req.params;

  try {
    if (name === "cost_by_service") {
      const rows = await withBudget(args.subscriptionId, () => costByService(args));
      return { content: [{ type: "text", text: formatTable(rows) }] };
    }
    if (name === "cost_top_movers") {
      const rows = await topMovers(args);
      return { content: [{ type: "text", text: formatTable(rows) }] };
    }
    throw new Error(`unknown tool: ${name}`);
  } catch (e) {
    return {
      isError: true,
      content: [{ type: "text", text: e instanceof Error ? e.message : String(e) }],
    };
  }
});

await server.connect(new StdioServerTransport());

The formatTable helper renders results as a markdown table, Copilot and Claude both render that natively in chat, so the answer feels native instead of "look at this JSON dump."

Wiring it into VS Code

// .vscode/mcp.json
{
  "mcpServers": {
    "azure-cost-insights": {
      "command": "node",
      "args": ["${workspaceFolder}/dist/server.js"],
      "env": {
        "AZURE_SUBSCRIPTION_ID": "${input:subId}"
      }
    }
  },
  "inputs": [
    { "id": "subId", "type": "promptString", "description": "Subscription" }
  ]
}

Restart VS Code, open Copilot, type "what did we spend on Azure OpenAI last week". Copilot picks the tool from the description, calls it, and renders the table inline.

What broke first

The model passed dates without timezones. It would send "2026-04-01" and the API treats that as midnight UTC, which loses six hours of cost data for a US-East team. I now normalise on the server: new Date(args.from + "T00:00:00Z").toISOString() and force-end at the next day's midnight UTC. Tool description was updated to say "dates are interpreted as UTC midnight".

Long answers got truncated by the client. Some clients cap a single tool response at 32K characters. A wide query can blow that. I added a format argument that defaults to summary (top 20 + total) and only returns full rows if the model explicitly asks for detailed. Regressions disappeared.

Auth at the wrong scope. The first version asked for https://management.azure.com/user_impersonation, that worked locally with a developer login but failed on Container Apps with managed identity. The right scope is https://management.azure.com/.default for service-to-service, every time.

What I'd cut

The cost_top_movers tool. Everyone asked the same week-over-week question with totally different phrasings and the model would pick the wrong tool half the time. I rewrote cost_by_service to accept an optional compareWindow parameter and now there's only one tool. Less surface area, fewer wrong picks.

I would NOT add a "summarise this in plain English" tool. The model can summarise. Tools should return structured data; the chat is where natural language lives.

MCPAzure Cost ManagementTypeScriptCustom Server

Building a Custom MCP Server for Azure Cost Insights, The 200-LOC Tool That Replaced Our Daily FinOps Email

What an MCP server actually is

The TypeScript skeleton

The Cost Management call

Cost Management's rate limit will bite you

Wiring the call handler

Wiring it into VS Code

What broke first

What I'd cut

Conversation

More from DevOps

Migrating Classic Release Pipelines to YAML, the Six-Week Phased Plan

Service Connection Vending With Workload Identity Federation, at Org Scale

Self-Hosted Azure DevOps Agents on AKS With KEDA Autoscaling