Skip to content
damionas
No. 36DevOpsDec 3, 202524 min read

Build and Ship an Azure Cost MCP Server From Empty Folder to Container Apps in 60 Minutes

For ten months our FinOps team published a beautifully formatted daily cost email. Subscription totals, top-five movers, tag breakdowns. It linked to two dashboards. It went to forty-seven engineers.

For ten months our FinOps team published a beautifully formatted daily cost email. Subscription totals, top-five movers, tag breakdowns. It linked to two dashboards. It went to forty-seven engineers. According to the email-tracking pixel, eight people opened it and two of those eight were on the FinOps team.

The email was the textbook FinOps deliverable, and the textbook is wrong. People don't read pushed reports. They read the answer to a question they just asked. So I deleted the email and replaced it with a small Model Context Protocol server that any engineer can call from their editor. Usage went from "ignored daily" to roughly eighty queries a week inside three weeks. Different shape, different verb, different result.

This post is the entire build. End state: a working MCP server you can call from Copilot in VS Code, deployed to Azure Container Apps under a managed identity, with the full repo source. Everything in one sitting, with the running commentary I'd give a colleague pairing on it next to me, including the parts where the docs are misleading and what to ignore.

Why MCP and not a CLI, or a Slack bot, or anything else

Quick aside before we start, because tooling choices matter and this is the question I get every time I demo this thing.

A CLI works fine if engineers will memorize the verbs. They won't. After about three commands, your CLI is competing for mindshare with kubectl, az, gh, and git, and it loses. We tried this in 2023; the CLI saw fewer queries than the email did.

A Slack bot works fine if engineers think to ask in Slack. They don't, because by the time they're in Slack they've context-switched and the question has decayed into "later." The window where someone wants to ask "what did we spend on AKS this week" is the window when they're staring at AKS code. That window is in the editor.

MCP wins because it's an in-context query layer. It speaks the same protocol Copilot speaks, Claude Desktop speaks, Cursor speaks, and the Azure DevOps MCP Server speaks. Write one server, every editor your team uses gets it for free.

That's the case. Now the build.

What you'll have at the end

~/azure-cost-mcp/
├── src/
│   ├── server.ts                  # MCP wiring
│   ├── tools/
│   │   ├── cost-by-service.ts     # tool implementation
│   │   └── format.ts              # markdown table renderer
│   ├── azure/
│   │   ├── credential.ts          # token acquisition
│   │   └── cost-management.ts     # API client + rate limiting
│   └── transport/
│       └── http.ts                # SSE-over-HTTP for Container Apps
├── tests/
│   ├── fixtures/
│   │   └── cost-by-service-week.json
│   └── cost-by-service.test.ts
├── infra/
│   ├── main.bicep
│   └── main.bicepparam
├── .github/workflows/
│   └── deploy.yml
├── Dockerfile
├── .vscode/
│   └── mcp.json
├── package.json
├── tsconfig.json
└── README.md

A 14-file repo, around 600 lines of TypeScript and Bicep. The TypeScript half is small on purpose, most of the surface area is Bicep, which is exactly where it belongs. Application code that wraps an Azure API doesn't need to be elaborate; the elaborate part is the operational story around it.

Prerequisites

Run each one and confirm the version. Anything older than these has known issues that aren't worth your debugging time:

node --version          # v22.x or newer
npm --version           # 10.x or newer
az --version            # 2.65 or newer
docker --version        # 27 or newer
gh --version            # 2.50 or newer
code --version          # VS Code 1.95 or newer with GitHub Copilot installed

A note on Node 22 specifically: you can technically run this on Node 20 LTS, but the MCP SDK uses native fetch in a way that's faster on 22. For a server that's expected to be fast, "technically works" isn't the bar. Pick the version your linter and your runtime agree on.

You'll also need:

  • An Azure subscription where you can create resource groups and assign roles
  • An Entra ID (Azure AD) tenant where you can create app registrations
  • A GitHub account where you can install the GitHub OIDC trust on a new app registration

az login
az account set --subscription "<your-subscription-id>"
gh auth status              # confirm logged in

Step 1: Scaffold the project

mkdir azure-cost-mcp && cd azure-cost-mcp
git init -b main
npm init -y

Install runtime + dev dependencies:

npm install \
  @modelcontextprotocol/sdk \
  @azure/identity \
  express \
  zod \
  zod-to-json-schema

npm install -D \
  typescript \
  @types/node \
  @types/express \
  vitest \
  tsx

About the dependencies: I deliberately picked tools that are unsurprising. express is boring; that's why it's here. zod is the one piece of opinion I have, it lets the same schema describe both the runtime validation and the JSON Schema we hand to MCP, which removes a class of "the validator and the docs disagreed" bugs. If you don't like Zod, hand-write the JSON Schema and validate manually. The server is small enough that either choice is fine.

Create tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "Bundler",
    "lib": ["ES2022"],
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "resolveJsonModule": true,
    "forceConsistentCasingInFileNames": true,
    "declaration": false,
    "sourceMap": true
  },
  "include": ["src/**/*"]
}

strict: true matters more than you'd think. The MCP protocol is JSON-RPC, which is a flat string-string-object world; type errors that TypeScript would catch in a typed codebase silently become "tool returned an opaque error" in MCP land. Don't ship the server without strict mode on.

Update package.json scripts:

{
  "type": "module",
  "scripts": {
    "build": "tsc",
    "start": "node dist/server.js",
    "dev": "tsx src/server.ts",
    "test": "vitest run",
    "test:watch": "vitest"
  }
}

Step 2: The Azure credential and the Cost Management client

src/azure/credential.ts:

import {
  DefaultAzureCredential,
  ManagedIdentityCredential,
  AzureCliCredential,
} from "@azure/identity";
import type { TokenCredential } from "@azure/identity";

/**
 * Use ManagedIdentityCredential in production (Container Apps) and AzureCliCredential
 * in local dev. DefaultAzureCredential walks several options and is slow on first call.
 */
export function buildCredential(): TokenCredential {
  if (process.env.NODE_ENV === "production") return new ManagedIdentityCredential();
  if (process.env.MCP_LOCAL_AUTH === "cli") return new AzureCliCredential();
  return new DefaultAzureCredential();
}

This is the file most tutorials get wrong, so it's worth slowing down for. The Azure SDK ships DefaultAzureCredential, which tries managed identity, then a chain of MSAL options, then the Azure CLI, then visual studio code, then powershell, until something works. It's convenient. It's also responsible for a 1.4-second first-call latency I spent two hours debugging the first time I shipped this server.

The reason: DefaultAzureCredential doesn't know your environment, so it walks the chain. In a Container App you know exactly what's there, a managed identity, nothing else. Pin to it directly and your first call is sub-100ms. The dev-time AzureCliCredential is the same idea: locally you know it's the CLI, so use the CLI directly.

If your team has standardised on DefaultAzureCredential because it's the documented default, you're trading documentation alignment for slowness. Worth it for greenfield code? Probably not. Worth changing for a small, focused server like this one? Yes.

src/azure/cost-management.ts:

import type { TokenCredential } from "@azure/identity";

const ARM = "https://management.azure.com";
const SCOPE = `${ARM}/.default`;

const buckets = new Map<string, { tokens: number; lastRefill: number }>();

async function withBudget<T>(subId: string, fn: () => Promise<T>): Promise<T> {
  const now = Date.now();
  const b = buckets.get(subId) ?? { tokens: 5, lastRefill: now };
  b.tokens = Math.min(5, b.tokens + (now - b.lastRefill) / 1000);
  b.lastRefill = now;
  if (b.tokens < 1) {
    await new Promise((r) => setTimeout(r, (1 - b.tokens) * 1000));
    b.tokens = 0;
  } else {
    b.tokens -= 1;
  }
  buckets.set(subId, b);
  return fn();
}

export type CostRow = { service: string; cost: string; currency: string };

export type CostByServiceArgs = {
  subscriptionId: string;
  from: string;
  to: string;
};

export async function costByService(
  credential: TokenCredential,
  args: CostByServiceArgs,
): Promise<CostRow[]> {
  return withBudget(args.subscriptionId, async () => {
    const token = await credential.getToken(SCOPE);
    if (!token) throw new Error("failed to acquire ARM token");

    const url = `${ARM}/subscriptions/${args.subscriptionId}` +
      `/providers/Microsoft.CostManagement/query?api-version=2024-08-01`;

    const body = {
      type: "ActualCost",
      timeframe: "Custom",
      timePeriod: {
        from: new Date(args.from + "T00:00:00Z").toISOString(),
        to:   new Date(args.to   + "T23:59:59Z").toISOString(),
      },
      dataset: {
        granularity: "None",
        aggregation: { totalCost: { name: "Cost", function: "Sum" } },
        grouping: [{ type: "Dimension", name: "ServiceName" }],
      },
    };

    const res = await fetch(url, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${token.token}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(body),
    });

    if (res.status === 429) {
      const retry = Number(res.headers.get("retry-after") ?? "30");
      throw Object.assign(new Error(`rate limited; retry in ${retry}s`), { retryable: true, retry });
    }
    if (!res.ok) throw new Error(`cost mgmt ${res.status}: ${await res.text()}`);

    const data = await res.json();
    return (data.properties.rows as any[][])
      .map((r) => ({ service: String(r[1]), cost: Number(r[0]).toFixed(2), currency: String(r[2]) }))
      .sort((a, b) => Number(b.cost) - Number(a.cost));
  });
}

A few things worth dwelling on, because they're the difference between a server that works for one user and a server that works for a team.

The token bucket is per subscription, not global. Cost Management's rate limits are per-subscription, you can have one tenant absolutely hammering subscription A and not affect subscription B. A global rate limit would have created head-of-line blocking that turned the rate-limit story into a fairness story, which is a much harder problem.

The "Custom" timeframe with explicit timePeriod is the only pair that handles arbitrary windows correctly. Cost Management has named timeframes like MonthToDate and BillingMonthToDate that sound nicer in code, but the moment you let an LLM construct date arguments, you'll learn that the model paraphrases dates loosely and produces inconsistent results across calls. Force ISO dates, parse them on the server, never trust the model to handle date math.

The 429 carries an attached retryable flag and a retry value. The wrapper layer above doesn't currently retry, but the metadata is there so you can wire retry without changing the core function later. This is the kind of detail that you regret omitting at exactly the moment you don't have time to add it.

Step 3: The markdown formatter and the tool wiring

src/tools/format.ts:

import type { CostRow } from "../azure/cost-management.js";

export function formatCostTable(rows: CostRow[], opts: { from: string; to: string }): string {
  if (rows.length === 0) return `_No cost data for ${opts.from} → ${opts.to}._`;

  const total = rows.reduce((sum, r) => sum + Number(r.cost), 0);
  const currency = rows[0].currency;

  const lines = [
    `**Spend by service** — ${opts.from} → ${opts.to}`,
    "",
    `| # | Service | Cost (${currency}) |`,
    `| --- | --- | ---: |`,
  ];
  rows.slice(0, 20).forEach((r, i) => {
    lines.push(`| ${i + 1} | ${r.service} | ${r.cost} |`);
  });
  if (rows.length > 20) lines.push(`| … | ${rows.length - 20} more services | — |`);
  lines.push("");
  lines.push(`**Total:** ${total.toFixed(2)} ${currency}`);
  return lines.join("\n");
}

This is the secret sauce of the whole server, and it's twenty lines of string formatting. Models render markdown natively in chat. They render JSON badly, the typical experience is "look at this dump, scroll, oh you wanted a summary, here, let me extract it for you", which doubles the latency and burns tokens for no reason. Markdown tables on the way out are a hundred-times improvement over JSON for free.

The cap at 20 rows matters too. Some MCP clients truncate a single tool response at around 32K characters. A wide subscription can produce hundreds of services. Returning the top 20 plus a count of the rest gives the model enough to answer the question without risking truncation. If the user wants the long tail, they ask a follow-up that filters, which is what you wanted them to do anyway.

src/tools/cost-by-service.ts:

import { z } from "zod";
import type { TokenCredential } from "@azure/identity";
import { costByService } from "../azure/cost-management.js";
import { formatCostTable } from "./format.js";

export const inputSchema = z.object({
  subscriptionId: z.string().uuid(),
  from: z.string().regex(/^\d{4}-\d{2}-\d{2}$/, "ISO date YYYY-MM-DD"),
  to:   z.string().regex(/^\d{4}-\d{2}-\d{2}$/, "ISO date YYYY-MM-DD"),
});

export async function handleCostByService(
  credential: TokenCredential,
  raw: unknown,
): Promise<{ content: { type: "text"; text: string }[]; isError?: boolean }> {
  const args = inputSchema.parse(raw);
  try {
    const rows = await costByService(credential, args);
    return { content: [{ type: "text", text: formatCostTable(rows, args) }] };
  } catch (e: any) {
    return { isError: true, content: [{ type: "text", text: e.message ?? String(e) }] };
  }
}

The inputSchema.parse(raw) looks innocuous and is doing real work. Models occasionally pass dates as "2026-04" (truncated), "04/01/2026" (American), or "yesterday" (literal). Without the regex, the call would fail in Cost Management's input parser with an opaque message. With it, the failure is a structured Zod error that the model immediately understands and corrects. Three lines of validation prevents an entire class of frustrating model interactions.

Step 4: The MCP server entry point

src/server.ts:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { CallToolRequestSchema, ListToolsRequestSchema } from "@modelcontextprotocol/sdk/types.js";
import { buildCredential } from "./azure/credential.js";
import { handleCostByService, inputSchema } from "./tools/cost-by-service.js";
import { startHttp } from "./transport/http.js";
import { zodToJsonSchema } from "zod-to-json-schema";

const credential = buildCredential();

const server = new Server(
  { name: "azure-cost-mcp", version: "0.1.0" },
  { capabilities: { tools: {} } },
);

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: "cost_by_service",
      description:
        "Returns Azure spend grouped by service for a date window. " +
        "Use for questions like 'what did we spend on AKS last week'. " +
        "Dates are interpreted as UTC midnight.",
      inputSchema: zodToJsonSchema(inputSchema) as any,
    },
  ],
}));

server.setRequestHandler(CallToolRequestSchema, async (req) => {
  const { name, arguments: args } = req.params;
  if (name === "cost_by_service") return handleCostByService(credential, args);
  return { isError: true, content: [{ type: "text", text: `unknown tool: ${name}` }] };
});

const transport = process.env.MCP_TRANSPORT ?? "stdio";
if (transport === "http") {
  startHttp(server, Number(process.env.PORT ?? 8080));
} else {
  await server.connect(new StdioServerTransport());
}

Two pieces of advice that go in this section.

First, the tool description is the product. You will spend more time tweaking that description than tweaking the implementation. Models pick tools by reading descriptions, not function names, and the difference between "Returns cost data" and "Returns Azure spend grouped by service for a date window. Use for questions like 'what did we spend on AKS last week'" is the difference between a tool that gets called twenty times a week and one that gets called twice. The example in the description is doing a lot of work, the model is pattern-matching against it.

Second, dual transport (stdio for dev, HTTP for prod) is the right call from day one. I tried to ship HTTP-only and immediately discovered that local debugging through a Container App is a bad time. Shipping stdio-only and skipping the HTTP transport means you can never deploy. Both paths in one binary, switched by an env var, and you debug locally exactly the way you debug in production.

src/transport/http.ts:

import express from "express";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import type { Server } from "@modelcontextprotocol/sdk/server/index.js";

export function startHttp(server: Server, port: number) {
  const app = express();
  const transports = new Map<string, SSEServerTransport>();

  app.get("/healthz", (_req, res) => res.status(200).send("ok"));

  app.get("/sse", async (req, res) => {
    const transport = new SSEServerTransport("/messages", res);
    transports.set(transport.sessionId, transport);

    const keepalive = setInterval(() => res.write(": keepalive\n\n"), 25_000);
    res.on("close", () => {
      clearInterval(keepalive);
      transports.delete(transport.sessionId);
    });

    await server.connect(transport);
  });

  app.post("/messages", express.json(), async (req, res) => {
    const id = req.query.sessionId as string;
    const transport = transports.get(id);
    if (!transport) return res.status(404).end();
    await transport.handlePostMessage(req, res);
  });

  app.listen(port, () => console.log(`MCP HTTP transport on :${port}`));
}

The setInterval(... 25_000) keepalive is the line you'll forget and then spend an afternoon debugging. Container Apps closes idle HTTP connections at 240 seconds. SSE connections look idle to the load balancer between events. Without a keepalive, the connection drops every four minutes and the client appears to "lose" the server. With one, the connection stays warm forever.

Why 25 seconds and not 60 or 120? Margin. Some load balancers and CDNs along the way have their own idle timers in the 30-60 second range. 25 keeps you under all of them with room to spare. There's no penalty for sending too often, a single SSE comment line costs nothing.

Step 5: Run it locally and call it

Build and run with stdio transport:

npm run build
node dist/server.js

In a second terminal, smoke-test with the MCP Inspector:

npx @modelcontextprotocol/inspector node dist/server.js
# Opens a UI on http://localhost:6274

In the Inspector UI:

  1. Click List Tools, you should see cost_by_service.
  2. Click the tool, fill in your real subscriptionId, from = 2025-09-01, to = 2025-09-30.
  3. Click Run, you should see a markdown table.

The Inspector is one of those tools that's underdocumented and indispensable. It runs your server with stdio and gives you a real client UI without involving Copilot or any LLM. If something fails here, it's your server. If something works here and fails in Copilot, it's the wiring. Knowing which side of the line a bug is on saves an hour every time.

If you see failed to acquire ARM token, run:

az login
export MCP_LOCAL_AUTH=cli
npm run build && node dist/server.js

If you see Authorization_Failed, the user/identity needs Cost Management Reader on the subscription:

az role assignment create \
  --assignee "$(az ad signed-in-user show --query id -o tsv)" \
  --role "Cost Management Reader" \
  --scope "/subscriptions/<your-subscription-id>"

A small pet peeve: the role you want is Cost Management Reader, not Reader. Plain Reader can see resources but not bills. People assume Reader covers both because Microsoft mostly thinks of cost as a property of resources. It isn't, and the role split reflects that. Worth knowing if you're explaining to a security team why your server needs a separate role.

Step 6: Add a unit test against a captured fixture

Capture a real response once:

mkdir -p tests/fixtures scripts
RECORD_FIXTURES=1 SUBSCRIPTION_ID=<your-sub> tsx scripts/capture-fixture.ts

scripts/capture-fixture.ts:

import { writeFileSync } from "node:fs";
import { buildCredential } from "../src/azure/credential.js";
import { costByService } from "../src/azure/cost-management.js";

const credential = buildCredential();
const rows = await costByService(credential, {
  subscriptionId: process.env.SUBSCRIPTION_ID!,
  from: "2025-09-01",
  to: "2025-09-07",
});
writeFileSync("tests/fixtures/cost-by-service-week.json", JSON.stringify(rows, null, 2));
console.log(`captured ${rows.length} rows`);

tests/cost-by-service.test.ts:

import { describe, it, expect, vi } from "vitest";
import { readFileSync } from "node:fs";
import { handleCostByService } from "../src/tools/cost-by-service.js";

const fixture = JSON.parse(readFileSync("tests/fixtures/cost-by-service-week.json", "utf8"));

describe("cost_by_service", () => {
  it("renders a markdown table", async () => {
    vi.mock("../src/azure/cost-management.js", () => ({ costByService: async () => fixture }));
    const result = await handleCostByService({} as any, {
      subscriptionId: "00000000-0000-0000-0000-000000000000",
      from: "2025-09-01",
      to: "2025-09-07",
    });
    expect(result.content[0].text).toMatch(/^\*\*Spend by service\*\*/);
    expect(result.content[0].text).toMatch(/\| Total/);
  });
});

Run:

npm test

Brief observation about testing tools that wrap APIs: you can mock at three levels, the API client, the HTTP layer, or the SDK. I picked the API client (vi.mock("../src/azure/cost-management.js")) because it's where the contract you actually own lives. Mocking at the HTTP layer pretends to test the SDK; mocking at the SDK layer is fragile to SDK updates. The API client is your interface, and that's the right thing to fence.

The fixture file is what catches the upstream regression that would otherwise ship. Re-record it monthly. If the shape of the response has changed (a new column, a renamed field), the diff in the fixture file will tell you in a PR, not in a production failure.

Step 7: Containerise

Dockerfile:

# syntax=docker/dockerfile:1.7
FROM node:22-alpine AS build
WORKDIR /src
COPY package*.json tsconfig.json ./
RUN npm ci
COPY src ./src
RUN npm run build && npm prune --production

FROM node:22-alpine
WORKDIR /app
RUN addgroup -S app && adduser -S app -G app
COPY --from=build --chown=app:app /src/node_modules ./node_modules
COPY --from=build --chown=app:app /src/dist ./dist
COPY --chown=app:app package.json ./
USER app
ENV NODE_ENV=production MCP_TRANSPORT=http PORT=8080
EXPOSE 8080
CMD ["node", "dist/server.js"]

Three things in this file are non-negotiable.

Multi-stage build. Without it, your production image carries typescript, vitest, @types/*, and roughly two hundred megabytes of source maps it doesn't need. With it, the image is around 120MB and pulls fast. Cold-start time on Container Apps is about pull + node-startup; halving the pull halves the cold start.

Non-root user. USER app looks fussy. Then Defender for Cloud opens a finding called "Containers should run as a non-root user" against your environment, and you fix it. Cheaper to fix once now than to argue with the security team later about whether it matters. (It does, marginally, privilege escalation in a sidecar is a real category of attack, but the actual reason to do it is operational, not security.)

Pinned node:22-alpine. Not node:latest. Not node:22. The Alpine variant is what gives you the small image; the major version pin protects you against an LTS bump that breaks something you didn't write. When the next LTS lands, you make the version bump explicitly, with a PR everyone can see.

Build and test the image locally:

docker build -t azure-cost-mcp:dev .
docker run --rm -p 8080:8080 azure-cost-mcp:dev

# In another terminal:
curl -sS http://localhost:8080/healthz
# ok

Step 8: Provision Azure with Bicep

infra/main.bicep:

param location string = resourceGroup().location
param environment string = 'prod'
param image string

resource law 'Microsoft.OperationalInsights/workspaces@2023-09-01' = {
  name: 'law-cost-mcp-${environment}'
  location: location
  properties: { sku: { name: 'PerGB2018' }, retentionInDays: 30 }
}

resource env 'Microsoft.App/managedEnvironments@2024-03-01' = {
  name: 'cae-cost-mcp-${environment}'
  location: location
  properties: {
    appLogsConfiguration: {
      destination: 'log-analytics'
      logAnalyticsConfiguration: {
        customerId: law.properties.customerId
        sharedKey: listKeys(law.id, '2023-09-01').primarySharedKey
      }
    }
    workloadProfiles: [
      { name: 'Consumption', workloadProfileType: 'Consumption' }
    ]
  }
}

resource app 'Microsoft.App/containerApps@2024-03-01' = {
  name: 'ca-cost-mcp-${environment}'
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    managedEnvironmentId: env.id
    workloadProfileName: 'Consumption'
    configuration: {
      ingress: {
        external: false
        targetPort: 8080
        transport: 'auto'
        traffic: [{ latestRevision: true, weight: 100 }]
      }
    }
    template: {
      containers: [{
        name: 'mcp'
        image: image
        resources: { cpu: json('0.5'), memory: '1Gi' }
        env: [{ name: 'NODE_ENV', value: 'production' }]
        probes: [{
          type: 'Liveness'
          httpGet: { path: '/healthz', port: 8080 }
          initialDelaySeconds: 5
          periodSeconds: 30
        }]
      }]
      scale: { minReplicas: 0, maxReplicas: 3 }
    }
  }
}

resource costReader 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(subscription().id, app.id, 'cost-reader')
  scope: subscription()
  properties: {
    roleDefinitionId: subscriptionResourceId(
      'Microsoft.Authorization/roleDefinitions',
      '72fafb9e-0641-4937-9268-a91bfd8191a3') // Cost Management Reader
    principalId: app.identity.principalId
    principalType: 'ServicePrincipal'
  }
}

output appName string = app.name
output ingressFqdn string = app.properties.configuration.ingress.fqdn

infra/main.bicepparam:

using 'main.bicep'
param environment = 'prod'
param image = 'ghcr.io/dammyboss/azure-cost-mcp:latest'  // replace with your registry

Two architectural notes here.

external: false is the choice that separates a careful build from a careless one. An MCP server that proxies the Cost Management API has, by construction, the ability to read every dollar your company spends on Azure. Putting that on the public internet is unforced. Internal-only ingress means the server is reachable only from the Container Apps environment's VNet (or peered VNets), which is approximately every internal client you actually have. If you genuinely need external access, say, contractors who can't VPN, front it with Application Gateway and an explicit auth layer. Don't expose it directly.

scale: { minReplicas: 0 } is the quiet superpower of Container Apps for this workload. Most of the time nobody's asking the server anything. Scaling to zero means most of the time the bill is zero. The cold start tax (around 4-6 seconds the first time someone asks after a quiet period) is real but acceptable for a query layer. If your team can't tolerate that, pin to minReplicas: 1 and pay roughly $7/month for the always-on replica. Both are valid; "zero or one minimum" is the question to ask, not "how do I tune scale-out behaviour".

Deploy once manually to verify:

RG=rg-cost-mcp-prod
az group create -n $RG -l eastus
az deployment group create \
  -g $RG \
  --template-file infra/main.bicep \
  --parameters infra/main.bicepparam

If the role assignment fails with AuthorizationFailed, you don't have permission to assign roles at subscription scope. Either ask an Owner, or scope the role assignment to a single resource group's billing data and accept the smaller blast radius.

Step 9: GitHub Actions deploy with OIDC

Create the federated identity (one-time):

APP_ID=$(az ad app create --display-name "github-cost-mcp" --query appId -o tsv)
az ad sp create --id "$APP_ID"
SP_OBJ=$(az ad sp show --id "$APP_ID" --query id -o tsv)

az ad app federated-credential create --id "$APP_ID" --parameters '{
  "name": "github-main",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:dammyboss/azure-cost-mcp:ref:refs/heads/main",
  "audiences": ["api://AzureADTokenExchange"]
}'

az role assignment create \
  --assignee-object-id "$SP_OBJ" --assignee-principal-type ServicePrincipal \
  --role Contributor --scope "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RG"

Set GitHub repo variables:

gh variable set AZURE_CLIENT_ID --body "$APP_ID"
gh variable set AZURE_TENANT_ID --body "$(az account show --query tenantId -o tsv)"
gh variable set AZURE_SUBSCRIPTION_ID --body "$(az account show --query id -o tsv)"
gh variable set RESOURCE_GROUP --body "$RG"
gh variable set ACR_NAME --body "<your-acr-name>"

A quick aside about why OIDC instead of a service principal secret: service principal secrets get rotated, get committed by accident, get pasted into Slack, get stored in 1Password, get inherited from someone who left the team, get noticed by an auditor, and (occasionally) get used by an attacker. OIDC federation has none of those failure modes. You set it up once, you never touch it again. The GitHub repo can mint a token, the token has a 60-minute life, the federation says only a workflow with this exact subject is allowed to claim it. There's no secret to leak.

If you've never used OIDC federation before, this setup will feel like overkill for a small server. It isn't. The setup is a one-time 90-second cost; the not-having-secrets is forever.

.github/workflows/deploy.yml:

name: deploy
on:
  push:
    branches: [main]
  workflow_dispatch:

permissions:
  id-token: write
  contents: read

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          client-id:       ${{ vars.AZURE_CLIENT_ID }}
          tenant-id:       ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}

      - name: Build & push image
        run: |
          az acr login --name ${{ vars.ACR_NAME }}
          IMAGE="${{ vars.ACR_NAME }}.azurecr.io/azure-cost-mcp:${{ github.sha }}"
          docker build -t "$IMAGE" .
          docker push "$IMAGE"
          echo "IMAGE=$IMAGE" >> $GITHUB_ENV

      - name: Deploy infra
        run: |
          az deployment group create \
            --resource-group ${{ vars.RESOURCE_GROUP }} \
            --template-file infra/main.bicep \
            --parameters image="$IMAGE"

      - name: Smoke test
        run: |
          FQDN=$(az containerapp show \
            -g ${{ vars.RESOURCE_GROUP }} -n ca-cost-mcp-prod \
            --query properties.configuration.ingress.fqdn -o tsv)
          for i in 1 2 3 4 5; do
            curl -fsS "https://$FQDN/healthz" && exit 0
            sleep 6
          done
          exit 1

Push and watch:

git add . && git commit -m "initial" && git push -u origin main
gh run watch

The smoke test loop with five retries and a six-second sleep is doing something specific: handling the cold start. The first request after a fresh deploy hits a Container App that's still pulling the image. Five retries × six seconds = 30 seconds of patience, which is enough for the cold path. It's not enough for a misconfigured deploy, which fails fast at the role assignment or the deployment step, not the smoke test.

Step 10: Wire it into VS Code Copilot

.vscode/mcp.json:

{
  "mcpServers": {
    "azure-cost": {
      "type": "sse",
      "url": "https://${input:fqdn}/sse",
      "env": {}
    }
  },
  "inputs": [
    { "id": "fqdn", "type": "promptString", "description": "Container App FQDN" }
  ]
}

Restart VS Code, open Copilot Chat (Ctrl+Alt+I), choose Agent mode, and ask:

What did we spend on Azure OpenAI in subscription <your-sub> between 2025-09-01 and 2025-09-30?

Copilot picks the tool from the description, calls it over SSE, renders the markdown table inline. First call may take 2 to 4 seconds because Container Apps is scaling from zero; subsequent calls are sub-second.

A small note about adoption. The first time you demo this to a teammate, frame the question naturally, "what did we spend on AKS this week", not "call the cost_by_service tool with these arguments." The whole point is that the model handles tool selection. If your demo requires the user to know the tool name, you've buried the value. The tool description is what makes natural-language queries route correctly. Keep tweaking it until your colleague's first attempt works.

Troubleshooting

These are the errors that will hit you and what they actually mean.

failed to acquire ARM token, Local dev with no az login. Run az login and set MCP_LOCAL_AUTH=cli. If it still fails, your CLI token is for a different tenant; use az account set --subscription <correct>.

401 Unauthorized from Cost Management, The identity (your user, or the Container App's managed identity) lacks Cost Management Reader. Grant at subscription scope. Note: the role assignment can take up to 5 minutes to propagate, so if you just granted it and you're getting 401, wait and retry before assuming it's wrong.

429 Too Many Requests, You hit Cost Management's rate limit. The token bucket retries once for the same call; for repeated 429s, your traffic pattern is too bursty. Reduce request volume, or contact support to raise the limit.

Container App pod stuck in Pending, Workload profile name in Bicep doesn't match a profile on the environment. The example uses Consumption; update if you've named it differently. This also fires if you typo the image name and the pull fails, check kubectl-equivalent's events view in the portal.

SSE connection drops after 4 minutes, Container Apps' default HTTP idle timeout. The keepalive in transport/http.ts (: keepalive\n\n every 25s) is what prevents this. Confirm the keepalive is in place. If you see drops sooner than 4 minutes, there's something else in the path (Front Door, AGW, an internal proxy) buffering.

Copilot can't find the server, Check .vscode/mcp.json is in the workspace root, not nested. The MCP extension prints loaded servers to the Output panel under "MCP", start there. If it's loaded but tools don't appear, check Copilot is in Agent mode (the model has to be allowed to call tools).

The model picks the wrong tool, Tool description isn't sharp enough. Add example phrasings the user might use, and the model will pattern-match. The single biggest description improvement I've made on tools is adding Use for questions like "..." to every one.

What this gets you and why it's worth it

You've built something with concrete properties:

  • A query layer that's reachable wherever engineers already are. The biggest behavioural change isn't the tool itself, it's that the friction of asking is now zero. The next time someone is staring at a costly resource and wonders "is this expensive?", the answer is two sentences away. That's a different organisation than one where the answer is in a dashboard nobody opens.

  • A single permission boundary. The server has Cost Management Reader at subscription scope. Anyone who can reach the server gets that level of insight. For anything more sensitive, exports, per-tenant attribution, write operations, you'd add Entra ID OAuth and per-tool scopes (which is a separate post). For internal cost insight, the simple model is enough.

  • A pattern that generalises. The shape of this server, managed identity, narrow API client, markdown formatter, Container Apps host, works for any read-only Azure data source. Azure Monitor, Azure DevOps, Azure Resource Graph, Bicep what-if. Three of those are now MCP servers running alongside this one in our infrastructure, and they share roughly 80% of the code. The investment in this build amortises across every future "give engineers structured access to X" project.

  • A cost story that's defensible. Container Apps scaled to zero, Log Analytics at the cheapest tier, ACR pulls at minimum. The whole thing runs at around five dollars a month idle. There is no FinOps argument against shipping this; if there were, the irony would be devastating.

The piece nobody tells you in advance is how much the team's relationship with cost data changes when the friction collapses. The daily email was a one-way push that nobody read. The MCP server is a two-way query that everybody uses. That's the same data, the same permissions, the same dollars, but a completely different cultural relationship with the question "what did this cost." If you've ever sat through a quarterly review where the engineering org has no idea where its cloud bill comes from, you'll recognise why that matters.

The build is twenty-something kilobytes of source. The change in habit is what justifies the work.

MCPContainer AppsEnd-to-End

Conversation

Reactions & comments

Liked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.

Loading conversation…

More from DevOps

See all →