Add Per-User OAuth and On-Behalf-Of to an Internal MCP Server

The day after we widened the audience for our internal MCP server to the broader engineering org, one of the first new users asked it to fetch cost data for a subscription they shouldn't have been able to see. Not maliciously, they were exploring; their MCP client was wired up before their RBAC was. The data came back because the MCP server was running under its own identity. The server had Cost Management Reader at subscription scope, the user could reach the server, the server returned what it could.

That's the whole bug, in one sentence: the MCP server's identity is not the user's identity, and the gap between them is where data leaks.

The fix is not subtle. Stop the MCP server from holding any Azure RBAC of its own, and instead make every tool call carry the caller's token, exchanged via On-Behalf-Of for a downstream Azure token that has exactly the caller's permissions. Each user sees exactly their own RBAC, no more. The server becomes a routing layer, not an identity.

This post is the entire build. By the end you have an Entra ID app with two custom OAuth scopes, a Node.js MCP server that validates the caller's bearer token (signature, issuer, audience), enforces per-tool scope checks, performs the OBO exchange, and a working end-to-end test with two different real user identities, verifying that User A's cost_export is denied while User B's cost_read succeeds. About 600 lines of TypeScript, plus 80 lines of Bicep, plus the operational discipline to never give the server a permanent role at any Azure scope.

Why this isn't optional

Brief detour because there's a tempting alternative pattern that's wrong, and it's worth being explicit about why.

The wrong pattern: server identity, plus an internal "user check" inside each tool. Server has Cost Management Reader, the tool reads the caller's email from a header, checks a list of allowed users, returns data if allowed. This works on day one. By month three you have a stale list, a teammate added without listing, an export that bypasses the list because someone wrote it without remembering, and a security review you cannot pass.

The wrong pattern shifts authorization from Entra ID (where it lives natively, with audit logs and conditional access and password rotation) into your application code (where it doesn't). The whole point of having an enterprise identity provider is to not maintain that machinery yourself.

The right pattern: the server has no Azure roles. The user has the roles they should have. Tokens flow through the server with OBO. Authorization is exactly the user's RBAC, and Entra ID and Azure RBAC are doing the auth work, which is what they're built for.

Once you internalise this, every other auth decision becomes simpler.

What you'll have at the end

~/mcp-entra-auth/
├── server/
│   ├── src/
│   │   ├── server.ts
│   │   ├── auth/
│   │   │   ├── jwt.ts
│   │   │   ├── scopes.ts
│   │   │   └── obo.ts
│   │   ├── tools/
│   │   │   ├── cost-by-service.ts
│   │   │   └── cost-export.ts
│   │   └── transport/
│   │       └── http.ts
│   ├── tests/
│   │   ├── auth.test.ts
│   │   └── e2e-two-users.test.ts
│   ├── Dockerfile
│   └── package.json
├── infra/
│   └── app-registration.bicep
├── client/
│   └── test-call.ts
└── README.md

Prerequisites

node --version          # v22+
az --version            # 2.65+
gh --version            # 2.50+

You'll need:

An Entra ID tenant where you can create app registrations and grant admin consent
Two test users (or two test accounts), let's call them alice@yourtenant.onmicrosoft.com and bob@yourtenant.onmicrosoft.com
A subscription where Alice has Cost Management Reader and Bob does not. The asymmetry is what makes the end-to-end test meaningful.

az login
TENANT_ID=$(az account show --query tenantId -o tsv)
SUB=$(az account show --query id -o tsv)

A practical aside: the second test user is the part most people skip. They build the server, test it themselves, ship it, and only later discover the per-user authorization wasn't actually working, because they only ever tested with one identity, which by definition has the right RBAC. Two users is the smallest possible test set that catches the bug class.

Step 1: App registration with custom scopes

infra/app-registration.bicep:

extension microsoftGraphV1
param appDisplayName string = 'mcp-cost-server'
param appIdUri string = 'api://mcp-cost-server'

resource app 'Microsoft.Graph/applications@v1.0' = {
  uniqueName: appDisplayName
  displayName: appDisplayName
  identifierUris: [appIdUri]
  api: {
    requestedAccessTokenVersion: 2
    oauth2PermissionScopes: [
      {
        id: guid('cost.read')
        adminConsentDisplayName: 'Read cost data'
        adminConsentDescription: 'Allow the app to read cost data on the user\'s behalf'
        userConsentDisplayName: 'Read your cost data'
        userConsentDescription: 'Allow the app to read cost data as you'
        type: 'User'
        value: 'cost.read'
        isEnabled: true
      }
      {
        id: guid('cost.export')
        adminConsentDisplayName: 'Export cost data'
        adminConsentDescription: 'Allow the app to export cost reports on the user\'s behalf'
        type: 'Admin'
        value: 'cost.export'
        isEnabled: true
      }
    ]
  }
  requiredResourceAccess: [
    {
      resourceAppId: '00000003-0000-0000-c000-000000000000' // Microsoft Graph
      resourceAccess: [{ id: 'e1fe6dd8-ba31-4d61-89e7-88639da4683d', type: 'Scope' }] // User.Read
    }
    {
      resourceAppId: '797f4846-ba00-4fd7-ba43-dac1f8f63013' // Azure Service Management
      resourceAccess: [{ id: '41094075-9dad-400e-a0bd-54e686782033', type: 'Scope' }] // user_impersonation
    }
  ]
}

resource sp 'Microsoft.Graph/servicePrincipals@v1.0' = {
  appId: app.appId
}

output appId string = app.appId
output tenantId string = tenant().tenantId

Two scopes, not one. The split matters more than it looks.

cost.read is a user-consent scope. A user can consent on their own. First-time users see a Microsoft consent prompt that says "this app wants to read cost data as you" and either click yes or no. No admin involvement.

cost.export is an admin-consent scope. Even if a user clicks yes on the consent prompt, the API call fails until an Entra admin grants tenant-wide consent. This is the lever you pull when an operation is sensitive enough to warrant a real review. Exporting cost data, for instance, produces a CSV that can leave Azure and end up in places you don't control. Worth a one-time admin review.

The requestedAccessTokenVersion: 2 is non-negotiable in 2026. v1 tokens are a different shape, missing oid in some places, putting scopes in roles for some flows. Don't fight v1; pin v2 and forget the difference exists.

The requiredResourceAccess block declares which downstream APIs the app can be granted access to. It doesn't grant anything; it just makes the consent flow work. Without Azure Service Management / user_impersonation listed here, OBO will fail with an opaque error about the resource not being known.

Deploy:

az deployment tenant create \
  --location eastus \
  --name mcp-app-reg \
  --template-file infra/app-registration.bicep \
  --query 'properties.outputs'

Capture the output:

APP_ID=$(az ad app list --display-name mcp-cost-server --query '[0].appId' -o tsv)
echo "APP_ID=$APP_ID"

You'll also need a client certificate for OBO. A self-signed one is fine for this lab:

openssl req -x509 -nodes -newkey rsa:2048 \
  -keyout cert.pem -out cert.pem -days 365 \
  -subj "/CN=mcp-cost-server-obo"

CERT_THUMBPRINT=$(openssl x509 -in cert.pem -fingerprint -sha1 -noout \
  | cut -d= -f2 | tr -d ':')

az ad app credential reset --id "$APP_ID" --cert "@cert.pem" --append

For production, the cert lives in Key Vault and is mounted via the CSI driver. Never put it in an env var or bake it into the image. The cert is the OBO equivalent of a service principal secret; treat it like one.

Step 2: JWT validation

server/src/auth/jwt.ts:

import { jwtVerify, createRemoteJWKSet } from "jose";

const TENANT_ID = process.env.AZURE_TENANT_ID!;
const APP_ID_URI = process.env.APP_ID_URI!; // e.g. api://mcp-cost-server

const jwks = createRemoteJWKSet(
  new URL(`https://login.microsoftonline.com/${TENANT_ID}/discovery/v2.0/keys`),
);

export type TokenClaims = {
  oid: string;
  preferred_username?: string;
  scp?: string;
  roles?: string[];
  aud: string;
  iss: string;
};

export class AuthError extends Error {
  constructor(public status: number, message: string) {
    super(message);
  }
}

export async function validateBearer(authz: string | undefined): Promise<TokenClaims> {
  if (!authz?.startsWith("Bearer ")) {
    throw new AuthError(401, "missing_bearer");
  }
  const token = authz.slice(7);
  try {
    const { payload } = await jwtVerify(token, jwks, {
      issuer: `https://login.microsoftonline.com/${TENANT_ID}/v2.0`,
      audience: APP_ID_URI,
    });
    return { ...(payload as TokenClaims), _raw: token } as any;
  } catch (e: any) {
    throw new AuthError(401, `invalid_token: ${e.message}`);
  }
}

Three checks pinned: signature (via JWKS), issuer (the v2.0 endpoint), audience (your app id URI). Skipping any one of these is the bug that turns a "secure" MCP server into a public one.

The audience check is the one most often skipped because the developer's first reaction to "audience mismatch" is to widen what's accepted. Don't. The audience tells you which API the token was minted for. A token minted for Microsoft Graph has the correct issuer and signature for your tenant; if you accept it on your API, an attacker who can get any user to grant a Graph token can call your API. Pin the audience to your app id URI exactly.

The jwks = createRemoteJWKSet(...) call caches Microsoft's public keys with a sensible TTL. The first request fetches them, subsequent requests use the cache. If Microsoft rotates the signing key (which they do, periodically), the cache invalidates and the next fetch picks up the new one. You don't have to do anything; the library handles it. This is one of the few places where a library's default is exactly right.

Step 3: Per-tool scope enforcement

server/src/auth/scopes.ts:

import type { TokenClaims } from "./jwt.js";

export class ScopeError extends Error {
  constructor(public status: number, message: string) {
    super(message);
  }
}

export function requireScope(claims: TokenClaims, required: string): void {
  const scopes = (claims.scp ?? "").split(" ");
  if (!scopes.includes(required)) {
    throw new ScopeError(403, `missing_scope:${required}`);
  }
}

Eight lines doing real work.

The scp claim is a space-separated string in v2 tokens (e.g. "cost.read profile email"). Splitting on space and checking includes is the right shape.

The error returned is 403 missing_scope:<scope>. Clients use the suffix to know which scope is missing, so they can re-auth with the right consent prompt. A bare 403 leaves the client to guess; the suffix turns it into a recoverable error.

The function is sync because the claims are already validated. Don't make it async; that just adds a footgun where someone forgets to await it and the check silently no-ops.

Step 4: On-Behalf-Of exchange

server/src/auth/obo.ts:

import {
  ConfidentialClientApplication,
  type OnBehalfOfRequest,
} from "@azure/msal-node";
import { readFileSync } from "node:fs";
import { createHash } from "node:crypto";

const APP_ID = process.env.AZURE_CLIENT_ID!;
const TENANT_ID = process.env.AZURE_TENANT_ID!;
const CERT_PATH = process.env.AZURE_CERT_PATH!;
const CERT_THUMBPRINT = process.env.AZURE_CERT_THUMBPRINT!;

const cert = readFileSync(CERT_PATH, "utf8");

const msal = new ConfidentialClientApplication({
  auth: {
    clientId: APP_ID,
    authority: `https://login.microsoftonline.com/${TENANT_ID}`,
    clientCertificate: { thumbprint: CERT_THUMBPRINT, privateKey: cert },
  },
});

const cache = new Map<string, { token: string; expiresAt: number }>();

export async function exchangeOBO(userToken: string, downstreamScope: string): Promise<string> {
  const userOid = parseOidFromToken(userToken);
  const cacheKey = createHash("sha256").update(userOid + downstreamScope).digest("hex");
  const cached = cache.get(cacheKey);
  if (cached && cached.expiresAt - Date.now() > 60_000) return cached.token;

  const req: OnBehalfOfRequest = {
    oboAssertion: userToken,
    scopes: [downstreamScope],
  };
  const result = await msal.acquireTokenOnBehalfOf(req);
  if (!result) throw new Error("OBO exchange returned null");

  cache.set(cacheKey, {
    token: result.accessToken,
    expiresAt: result.expiresOn?.getTime() ?? Date.now() + 30 * 60_000,
  });
  return result.accessToken;
}

function parseOidFromToken(token: string): string {
  const [, payload] = token.split(".");
  return JSON.parse(Buffer.from(payload, "base64").toString()).oid;
}

The cache key is sha256(userOid + scope), TTL approximately 30 minutes. Without it, every request triggers another OBO exchange, which is slow (around 150ms) and gets you rate-limited at scale.

The cache invalidation has a subtle bug class: if a user's permissions are revoked, the cached token still works until expiry. For an MCP server with 30-minute cache TTL, that means up to 30 minutes of stale authorization. For a cost server this is acceptable; for a server that can mutate state, it's not. The mitigation is shorter TTL plus explicit cache invalidation when a session ends. We use 30 minutes here because the trade-off favours performance for a read-only workload.

The certificate is loaded synchronously at module load. If the file isn't there, the server fails to start with a clear filesystem error. Don't try to load it lazily on first request; that turns a startup failure (loud, easy to fix) into a runtime failure (quiet, lands in production).

Step 5: Tool implementations that respect the user

server/src/tools/cost-by-service.ts:

import type { TokenClaims } from "../auth/jwt.js";
import { requireScope } from "../auth/scopes.js";
import { exchangeOBO } from "../auth/obo.js";

export async function costByService(
  claims: TokenClaims & { _raw: string },
  args: { subscriptionId: string; from: string; to: string },
): Promise<{ content: { type: "text"; text: string }[]; isError?: boolean }> {
  requireScope(claims, "cost.read");

  const armToken = await exchangeOBO(claims._raw, "https://management.azure.com/.default");

  const url =
    `https://management.azure.com/subscriptions/${args.subscriptionId}` +
    `/providers/Microsoft.CostManagement/query?api-version=2024-08-01`;

  const res = await fetch(url, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${armToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      type: "ActualCost",
      timeframe: "Custom",
      timePeriod: {
        from: new Date(args.from + "T00:00:00Z").toISOString(),
        to:   new Date(args.to   + "T23:59:59Z").toISOString(),
      },
      dataset: {
        granularity: "None",
        aggregation: { totalCost: { name: "Cost", function: "Sum" } },
        grouping: [{ type: "Dimension", name: "ServiceName" }],
      },
    }),
  });

  if (res.status === 403) {
    return { isError: true, content: [{ type: "text", text:
      `403: caller (${claims.preferred_username}) lacks Cost Management Reader on subscription ${args.subscriptionId}` }] };
  }
  if (!res.ok) throw new Error(`cost mgmt ${res.status}: ${await res.text()}`);

  const data = await res.json();
  const rows = (data.properties.rows as any[][])
    .map((r) => ({ service: String(r[1]), cost: Number(r[0]).toFixed(2) }))
    .sort((a, b) => Number(b.cost) - Number(a.cost));

  return {
    content: [
      {
        type: "text",
        text: [
          `**Spend by service**, ${args.from} to ${args.to}`,
          ``,
          `| # | Service | Cost |`,
          `| --- | --- | ---: |`,
          ...rows.slice(0, 20).map((r, i) => `| ${i + 1} | ${r.service} | ${r.cost} |`),
        ].join("\n"),
      },
    ],
  };
}

The 403 is the test case. When Bob calls this against a subscription he doesn't have RBAC on, the server doesn't get the data. The MCP server itself has no role at the subscription, so it can't accidentally leak.

The 403 message includes Bob's preferred_username. This is deliberate. When Bob calls the tool and gets the error, the message tells him whose token was used (his), which subscription was attempted, and why he was denied (he lacks the role). That triple is what makes the error actionable. Bare "403 forbidden" leaves him to guess.

Step 6: The HTTP transport that injects user context

server/src/transport/http.ts:

import express from "express";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import type { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { validateBearer, AuthError } from "../auth/jwt.js";

export function startHttp(server: Server, port: number) {
  const app = express();
  const transports = new Map<string, SSEServerTransport>();

  app.get("/healthz", (_req, res) => res.status(200).send("ok"));

  app.get("/sse", async (req, res) => {
    try {
      const claims = await validateBearer(req.headers.authorization);
      const transport = new SSEServerTransport("/messages", res);
      (transport as any).session = { claims };
      transports.set(transport.sessionId, transport);

      const keepalive = setInterval(() => res.write(": keepalive\n\n"), 25_000);
      res.on("close", () => {
        clearInterval(keepalive);
        transports.delete(transport.sessionId);
      });

      await server.connect(transport);
    } catch (e) {
      if (e instanceof AuthError) return res.status(e.status).json({ error: e.message });
      throw e;
    }
  });

  app.post("/messages", express.json(), async (req, res) => {
    const id = req.query.sessionId as string;
    const t = transports.get(id);
    if (!t) return res.status(404).end();
    await t.handlePostMessage(req, res);
  });

  app.listen(port, () => console.log(`MCP HTTP transport (with auth) on :${port}`));
}

The user's claims ride along with the SSE session, so each tools/call request can read them via (transport as any).session.claims. This is the bridge between the HTTP-level auth and the MCP-level handler.

The (transport as any).session casts an opaque session object onto the transport. The MCP SDK doesn't ship a request-context with claims out of the box; teams either patch it on (which is what we're doing) or maintain a parallel session map. Patching feels hacky; in practice it's the cleanest because the lifecycle of the session matches the lifecycle of the transport, and the SDK already manages that.

Step 7: Mint two real user tokens for testing

This is the part most tutorials skip. To genuinely test OBO you need two real Entra users, each with a token they minted by signing in.

# In one terminal as Alice:
az login
az account get-access-token --resource "$APP_ID_URI" --query accessToken -o tsv > alice-token.txt

# In another terminal as Bob:
az login
az account get-access-token --resource "$APP_ID_URI" --query accessToken -o tsv > bob-token.txt

For non-interactive automation in CI, the Resource Owner Password Credentials flow is the only way to script user-token acquisition, but Microsoft strongly discourages it for production. For the lab, the device-code flow above is enough.

A practical observation: most teams I've seen do this end-to-end testing manually once and then never again. Don't be that team. Build a CI job that uses two service-principal-style identities (each with the equivalent of "user A" and "user B" RBAC), runs the e2e suite, and verifies the authorization boundary. Authorization regressions are silent until they're catastrophic; periodic active testing is what catches them.

Put the tokens in env vars:

ALICE_TOKEN=$(cat alice-token.txt)
BOB_TOKEN=$(cat bob-token.txt)

Step 8: The two-user end-to-end test

server/tests/e2e-two-users.test.ts:

import { describe, it, expect } from "vitest";

const SERVER = process.env.MCP_SERVER_URL!;
const SUB = process.env.AZURE_SUB!;

describe("two-user e2e", () => {
  it("Alice can call cost_by_service", async () => {
    const res = await callTool(process.env.ALICE_TOKEN!, "cost_by_service", {
      subscriptionId: SUB,
      from: "2025-09-01",
      to:   "2025-09-30",
    });
    expect(res.isError).toBeFalsy();
    expect(res.content[0].text).toMatch(/Spend by service/);
  });

  it("Bob is denied (no RBAC at subscription)", async () => {
    const res = await callTool(process.env.BOB_TOKEN!, "cost_by_service", {
      subscriptionId: SUB,
      from: "2025-09-01",
      to:   "2025-09-30",
    });
    expect(res.isError).toBeTruthy();
    expect(res.content[0].text).toMatch(/lacks Cost Management Reader/);
  });

  it("Both Alice and Bob are denied cost_export without admin consent on cost.export", async () => {
    for (const tok of [process.env.ALICE_TOKEN!, process.env.BOB_TOKEN!]) {
      const res = await callTool(tok, "cost_export", { subscriptionId: SUB });
      expect(res.isError).toBeTruthy();
      expect(res.content[0].text).toMatch(/missing_scope:cost\.export/);
    }
  });
});

async function callTool(token: string, tool: string, args: any) {
  const res = await fetch(`${SERVER}/messages`, {
    method: "POST",
    headers: { Authorization: `Bearer ${token}`, "Content-Type": "application/json" },
    body: JSON.stringify({ method: "tools/call", params: { name: tool, arguments: args } }),
  });
  return await res.json();
}

The third test is the important one. Even Alice can't call cost_export because nobody has consented to that scope yet. Authorization is two-stage: the user's token must include the scope (which requires consent), and the user must have the underlying RBAC (which requires the role). Both must be true. The test verifies the first stage works as expected.

To grant the admin consent:

az ad app permission grant --id "$APP_ID" --api "$APP_ID" \
  --scope "cost.export"

After admin consent, re-run the test. Alice now passes cost_export (consent + RBAC). Bob's cost_export still fails, but at the Azure layer not the scope layer. The error message changes accordingly, which is exactly the granular signal you want.

Production checklist

Cert in Key Vault, mounted via CSI driver. Never as an env var or a file in the image.
Conditional Access compatibility. Handle the interaction_required MSAL error and return a structured 403 with the claims field echoed back so clients can re-auth with the right CA challenge. Without this, users hit an opaque error instead of being prompted to satisfy MFA.
Audit log every authorization decision. accept, deny:scope, deny:rbac, three states, all logged with the user's oid and the tool name. Six months later, when someone asks "did Bob ever try to access subscription X?", you have an answer.
App role for service callers. For tools that machines call (not users), declare an appRole on the app registration and check claims.roles instead of claims.scp. App roles ride in the roles claim and don't need explicit user consent; they're admin-assigned.
Token cache TTL ≤ 30 minutes. Long enough to be useful, short enough that revoked sessions stop working in a reasonable window.
Monitor OBO failure rate. Sudden spike usually means a Conditional Access policy changed. Alert on it.

Troubleshooting

AADSTS50158: External security challenge not satisfied, Conditional Access requires MFA or device compliance for the downstream resource. Catch the error, return 403 with the policy's claims requirement, the client re-auths.

AADSTS65001: The user has not consented, cost.export (or another admin-consent scope) hasn't been granted. Admin must run az ad app permission grant.

Invalid audience on JWT validation, Caller requested the token for the wrong resource. The audience must be your app's id URI (api://mcp-cost-server), not its FQDN. Check the aud claim on the failing token and compare to what the app registration expects.

Token contains scope but tool says missing_scope, Token v1 vs v2 difference. v1 puts scopes in scp (space-separated); v2 the same. App roles are in roles (array). Confirm requestedAccessTokenVersion: 2 on the app registration.

OBO returns 401, The MCP server's app registration doesn't have Microsoft Azure Service Management / user_impersonation listed under requiredResourceAccess. Add it and re-grant admin consent.

OBO returns 500 with a confusing message, Cert and thumbprint don't match. openssl x509 -fingerprint -sha1 over the cert and compare to AZURE_CERT_THUMBPRINT. Mismatch is the most common cause.

What this gives you, beyond the obvious

The obvious win is the security model. Each user sees their RBAC, no more. That alone justifies the work.

The less obvious win is what happens to how the team thinks about the MCP server. Before this change, the server was an entity with permissions, and "what should this server be allowed to do" was a recurring conversation. After this change, the server has no permissions, and "what should Alice be allowed to do" is the conversation, which is the same conversation the team already has about every other corporate system. The MCP server stops being a special case in the security model.

The further-along win, which takes a quarter or two to mature, is that you can give the MCP server to anyone in the org without negotiating their access. Someone joins the FinOps team, they get Cost Management Reader through the normal access process (PIM, group assignment, whatever your tenant uses), they point their MCP client at the server, it works. Someone leaves the team, their access is revoked, the server stops returning data to them within 30 minutes. The server stays out of the conversation.

That's the operational property that turns an internal tool into a piece of infrastructure: it stops being a thing one team owns and starts being a service the whole org uses. The 600 lines of TypeScript are the means; the change in operational posture is what makes the work worthwhile.

There's a quieter cultural effect, too. In the audit conversation, the question "what data can the MCP server access" used to require an inventory of the server's role assignments and a discussion of risk. Now the answer is "none", and the follow-up is "the data each user can access is governed by their normal Azure RBAC, which is reviewed in the standard way". That answer ends the audit conversation in about two minutes. I have spent significantly longer in audit conversations than that, and the difference is most of the value of the work.

MCPEntra IDOAuthOBO