The first version of our internal MCP server ran on a developer's laptop. It worked beautifully. Then they took a Friday off and the FinOps Slack channel filled with "MCP server unavailable" complaints by 11am, because the laptop had auto-locked, the SSE connections had dropped, and nobody else on the team had the env vars to reproduce it.
This is, broadly, the entire history of internal tooling. A single engineer builds a thing on their laptop. The team likes it. Suddenly the laptop is a piece of production infrastructure. The day the laptop closes is the day the team learns about single points of failure the hard way.
We moved to Azure Container Apps with workload identity federation. Same protocol, same code, no keys, scale to zero when no one's asking, real liveness and readiness probes, real diagnostic telemetry, and an SLA the team can think about. This post is the full hosting walkthrough. By the end you have an MCP server you can hand off to someone else and they can keep it running without owning the laptop it was built on.
If you've already got the application code from the cost-MCP tutorial, this is the platform engineering deep dive on the hosting half.
Why Container Apps and not AKS, Functions, or App Service
Quick detour because tooling choices have long tails.
App Service would work. It's the boring option. Why not it: the cheaper SKUs don't scale to zero, which means an idle MCP server is paying for a B1 instance forever. The plus side is that App Service has the most mature managed identity story; if you already use it for everything else, the consistency might outweigh the cost story.
Functions would also work. It scales to zero on the consumption tier, costs essentially nothing idle. Why not it: cold starts on consumption tier are 2 to 8 seconds, which is fine for a webhook but distinctly not fine for a query tool engineers use interactively. The premium tier solves cold starts but loses the scale-to-zero argument.
AKS would work and is what a few teams reach for. It's overkill. You'd pay for the node pool, manage Kubernetes upgrades, write a Helm chart, set up ingress, and at the end you'd have something behaviourally identical to a Container App. AKS is the right answer if you have other things on the cluster that share its lifecycle. A single MCP server on its own cluster is a maintenance burden.
Container Apps wins on three axes for this exact workload. Scale-to-zero with HTTP-based scaling, workload identity baked in (identity: { type: 'SystemAssigned' } and you're done), no node pool to babysit. Cold starts are around 4 to 6 seconds from zero, fast enough that "I'll wait" is the natural reaction. The right tool for the job, and "the right tool" matters more in 2026 than it used to, because the menu of platforms has gotten long enough that "use whatever we already have" is no longer a defensible default.
What you'll have at the end
~/mcp-on-aca/
├── infra/
│ ├── main.bicep
│ ├── modules/
│ │ ├── env.bicep
│ │ ├── app.bicep
│ │ ├── private-link.bicep
│ │ └── rbac.bicep
│ └── main.bicepparam
├── server/
│ └── Dockerfile
├── scripts/
│ ├── smoke.sh
│ └── tail-logs.sh
└── README.md
About 350 lines of Bicep, all parameterised, ready to drop into a real platform monorepo. The application code is whatever your MCP server happens to be; this lab is purely about the hosting story.
Prerequisites
az --version # 2.65 or newer
az bicep version # 0.30 or newer (run `az bicep upgrade` if older)
docker --version
jq --version # for the smoke script
You'll need an Azure subscription where you can create resource groups, register resource providers, and assign roles at the subscription scope. Subscription-Owner is overkill but Contributor + Role-Based Access Control Administrator on the target RG is the practical minimum.
az login
az account set --subscription "<your-subscription-id>"
for p in Microsoft.App Microsoft.OperationalInsights Microsoft.Network \
Microsoft.ContainerRegistry Microsoft.ManagedIdentity; do
az provider register -n "$p" --wait
done
The provider registration loop is something most tutorials skip because they assume the providers are already on. New subscriptions don't have them, and you'll get an opaque "ResourceTypeNotFound" error if you skip this. Idempotent so it's safe to re-run.
Step 1: The Container Apps environment + Log Analytics
infra/modules/env.bicep:
param location string = resourceGroup().location
param name string
param vnetSubnetId string
param logRetentionDays int = 30
resource law 'Microsoft.OperationalInsights/workspaces@2023-09-01' = {
name: 'law-${name}'
location: location
properties: {
sku: { name: 'PerGB2018' }
retentionInDays: logRetentionDays
}
}
resource env 'Microsoft.App/managedEnvironments@2024-03-01' = {
name: 'cae-${name}'
location: location
properties: {
appLogsConfiguration: {
destination: 'log-analytics'
logAnalyticsConfiguration: {
customerId: law.properties.customerId
sharedKey: listKeys(law.id, '2023-09-01').primarySharedKey
}
}
vnetConfiguration: {
infrastructureSubnetId: vnetSubnetId
internal: true
}
workloadProfiles: [
{ name: 'Consumption', workloadProfileType: 'Consumption' }
]
zoneRedundant: false
}
}
output environmentId string = env.id
output environmentName string = env.name
output staticIp string = env.properties.staticIp
output workspaceId string = law.id
Two non-obvious choices in this module, both worth understanding because they shape what the server can do.
internal: true gives the env a private IP only. No internet exposure, no Front Door wiring, reachable inside the corporate VNet over a private DNS zone Azure provisions automatically. For an internal MCP server, this is the right default. The temptation to expose it externally with "we'll add auth later" is the same temptation that produces internal services on the public internet behind a basic-auth header that nobody rotates. Internal-by-default is the contract you want with your future self.
Zone redundancy is off because for a small internal MCP server it doubles cost without buying anything an MCP client cares about. If a zone goes down, the worst case is engineers can't query cost data for a few minutes; this isn't tier-1 production traffic. Turn on zone redundancy when the workload genuinely needs it. Don't do it because the documentation suggests it as a default.
PerGB2018 for Log Analytics is the cheapest tier and the right one until you have enough log volume to justify a capacity reservation. For an MCP server emitting structured logs at maybe 50KB per query, you'll be on the free tier (5GB/month) for a long time.
Step 2: The Container App
infra/modules/app.bicep:
param location string = resourceGroup().location
param name string
param environmentId string
param image string
param targetPort int = 8080
param minReplicas int = 0
param maxReplicas int = 5
param workspaceId string
resource app 'Microsoft.App/containerApps@2024-03-01' = {
name: 'ca-${name}'
location: location
identity: { type: 'SystemAssigned' }
properties: {
managedEnvironmentId: environmentId
workloadProfileName: 'Consumption'
configuration: {
activeRevisionsMode: 'Single'
ingress: {
external: false
targetPort: targetPort
transport: 'auto'
allowInsecure: false
traffic: [{ latestRevision: true, weight: 100 }]
stickySessions: { affinity: 'sticky' }
}
}
template: {
containers: [{
name: 'mcp'
image: image
resources: { cpu: json('0.5'), memory: '1Gi' }
env: [
{ name: 'NODE_ENV', value: 'production' }
{ name: 'MCP_TRANSPORT', value: 'http' }
{ name: 'PORT', value: string(targetPort) }
]
probes: [
{
type: 'Liveness'
httpGet: { path: '/healthz', port: targetPort }
initialDelaySeconds: 5
periodSeconds: 30
failureThreshold: 3
}
{
type: 'Readiness'
httpGet: { path: '/healthz', port: targetPort }
initialDelaySeconds: 2
periodSeconds: 5
}
]
}]
scale: {
minReplicas: minReplicas
maxReplicas: maxReplicas
rules: [
{
name: 'http-rule'
http: { metadata: { concurrentRequests: '20' } }
}
]
}
}
}
}
resource diag 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
scope: app
name: 'baseline'
properties: {
workspaceId: workspaceId
logs: [
{ category: 'ContainerAppConsoleLogs', enabled: true }
{ category: 'ContainerAppSystemLogs', enabled: true }
]
}
}
output principalId string = app.identity.principalId
output appName string = app.name
output fqdn string = app.properties.configuration.ingress.fqdn
Three choices that matter for production behaviour.
stickySessions.affinity: 'sticky' is the SSE survival flag. SSE clients hold a long-lived connection to one replica. If the load balancer routes the next message of the same session to a different replica, the new replica doesn't know about the session, the request 404s, the client experiences the server "dropping" them. Sticky affinity routes the same client back to the replica it started on. Without this, an MCP server scaled past one replica is a flaky MCP server.
activeRevisionsMode: 'Single' makes new revisions replace the old one wholesale. Multiple revisions enable blue/green at the cost of complexity (now you have two live versions, two log streams, two debugging contexts). Defer until you actually need it. For most internal tools, "Single" is right and "Multiple" is a footgun pretending to be a feature.
The HTTP scaler with concurrentRequests: '20' scales up when sustained concurrency exceeds 20 per replica. For most MCP workloads this lands at one or two replicas; bursts trigger more. The 20 is a tuning knob: lower it (say to 5) for verifying the scale rule fires, then raise it back up. The default scaler in older API versions was based on a different metric and behaved unpredictably; pinning to concurrentRequests makes the behaviour testable.
The diagnostic settings block at the bottom of the module is the one most teams forget because it lives in a different namespace (Microsoft.Insights) from the app. Without it, your logs go to a place you can't query. With it, console logs are queryable in Log Analytics within seconds and you can build the latency dashboard in step 11.
Step 3: RBAC for the system-assigned identity
infra/modules/rbac.bicep:
targetScope = 'subscription'
param principalId string
param roleAssignments array // [{ scope: '...', roleDefinitionId: '...' }]
resource assignments 'Microsoft.Authorization/roleAssignments@2022-04-01' = [
for (ra, i) in roleAssignments: {
name: guid(principalId, ra.scope, ra.roleDefinitionId, string(i))
properties: {
principalId: principalId
principalType: 'ServicePrincipal'
roleDefinitionId: ra.roleDefinitionId
}
scope: tenantResourceId('Microsoft.Resources/subscriptions', subscription().subscriptionId)
}
]
The module accepts a list of (scope, roleDefinitionId) pairs. Don't hardcode roles inside this module, each consumer module declares the roles its app needs. Keeps the blast radius for changes small, and makes the security review one file at a time.
A small cultural observation: this is the kind of module that, in a six-month-old codebase, has accumulated forty hardcoded roles for fifteen apps because someone copy-pasted it once and nobody refactored. The array parameter is what prevents that. If you can't articulate, on the consumer side, "this app needs Cost Management Reader at subscription scope and AcrPull on the registry", you don't yet understand what the app does. The Bicep is doing some of the security review for you.
Step 4: Private endpoint into a peered VNet
infra/modules/private-link.bicep:
param location string = resourceGroup().location
param environmentName string
param subnetId string
param privateDnsZoneId string
resource env 'Microsoft.App/managedEnvironments@2024-03-01' existing = {
name: environmentName
}
resource pe 'Microsoft.Network/privateEndpoints@2024-05-01' = {
name: 'pe-${environmentName}'
location: location
properties: {
subnet: { id: subnetId }
privateLinkServiceConnections: [
{
name: 'plsc-${environmentName}'
properties: {
privateLinkServiceId: env.id
groupIds: ['managedEnvironments']
}
}
]
}
}
resource zoneGroup 'Microsoft.Network/privateEndpoints/privateDnsZoneGroups@2024-05-01' = {
parent: pe
name: 'default'
properties: {
privateDnsZoneConfigs: [
{
name: 'aca-zone'
properties: { privateDnsZoneId: privateDnsZoneId }
}
]
}
}
The private endpoint needs a private DNS zone for *.privatelink.<region>.azurecontainerapps.io. Either reuse one your platform team already owns, or create one in this stack. If you create it, link it to every VNet whose workloads will resolve the MCP server's FQDN.
The DNS story is where most "internal-only Container Apps" projects get stuck for a day. The private endpoint resolves to a private IP only via the private DNS zone; if the zone isn't linked to the workload's VNet, the FQDN won't resolve from there and the workload silently fails to reach the server. Test resolution from inside the workload's VNet, not from your laptop, before declaring the deploy successful. nslookup ca-mcp-prod.internal... from a VM in the right VNet is the unambiguous test.
Step 5: Wire the modules together
infra/main.bicep:
targetScope = 'resourceGroup'
param location string = resourceGroup().location
param name string
param image string
param vnetSubnetId string
param peSubnetId string
param privateDnsZoneId string
module envModule 'modules/env.bicep' = {
name: 'env'
params: {
location: location
name: name
vnetSubnetId: vnetSubnetId
}
}
module appModule 'modules/app.bicep' = {
name: 'app'
params: {
location: location
name: name
environmentId: envModule.outputs.environmentId
image: image
workspaceId: envModule.outputs.workspaceId
}
}
module peModule 'modules/private-link.bicep' = {
name: 'pe'
params: {
location: location
environmentName: envModule.outputs.environmentName
subnetId: peSubnetId
privateDnsZoneId: privateDnsZoneId
}
}
module rbacModule 'modules/rbac.bicep' = {
name: 'rbac'
scope: subscription()
params: {
principalId: appModule.outputs.principalId
roleAssignments: [
{
scope: subscription().id
roleDefinitionId: subscriptionResourceId(
'Microsoft.Authorization/roleDefinitions',
'72fafb9e-0641-4937-9268-a91bfd8191a3') // Cost Management Reader
}
]
}
}
output appFqdn string = appModule.outputs.fqdn
output appName string = appModule.outputs.appName
infra/main.bicepparam:
using 'main.bicep'
param name = 'mcp-prod'
param image = '<your-registry>.azurecr.io/mcp-server:latest'
param vnetSubnetId = '/subscriptions/.../subnets/aca-infra-001'
param peSubnetId = '/subscriptions/.../subnets/private-endpoints'
param privateDnsZoneId = '/subscriptions/.../privateDnsZones/privatelink.eastus.azurecontainerapps.io'
The four module composition is deliberate. env gets the Log Analytics workspace and the Container Apps environment in one place because they share a lifecycle. app is the workload itself plus its diagnostics. private-link is the network plumbing. rbac is the security layer.
Each can be reused or replaced independently. If you decide to switch the workload to AKS later, app is the only module that changes; env (well, the LAW), private-link, and rbac come along. If you decide to add a second MCP server in the same env, you reuse env and add a second app. The boundaries match the rate of change, not the namespaces of Bicep.
Step 6: Deploy and verify
RG=rg-mcp-prod
az group create -n $RG -l eastus
az deployment group create \
-g $RG \
--template-file infra/main.bicep \
--parameters infra/main.bicepparam \
--query 'properties.outputs'
Once it finishes (roughly 3 to 4 minutes, Container Apps env creation is the slow step):
FQDN=$(az containerapp show -g $RG -n ca-mcp-prod \
--query properties.configuration.ingress.fqdn -o tsv)
echo "FQDN=$FQDN"
The FQDN looks like ca-mcp-prod.internal.<random>.<region>.azurecontainerapps.io. From inside the VNet, this resolves via the private DNS zone to the env's static IP. From outside, nslookup returns NXDOMAIN, which is the right answer.
Step 7: The smoke test
scripts/smoke.sh:
#!/usr/bin/env bash
set -euo pipefail
FQDN="${1:-${FQDN:?missing FQDN}}"
echo "=== /healthz ==="
curl -fsS "https://$FQDN/healthz"
echo
echo "=== open SSE channel ==="
TMP=$(mktemp)
( curl -N -fsS "https://$FQDN/sse" > "$TMP" 2>&1 || true ) &
SSE_PID=$!
sleep 3
kill "$SSE_PID" 2>/dev/null || true
if grep -q "endpoint" "$TMP"; then
echo "OK: SSE channel emitted endpoint event"
head -5 "$TMP"
else
echo "FAIL: no SSE endpoint event"
cat "$TMP"
exit 1
fi
rm -f "$TMP"
echo "=== smoke test passed ==="
chmod +x scripts/smoke.sh
./scripts/smoke.sh "$FQDN"
The first run from a peered VNet typically takes 4 to 6 seconds (cold start, image pull, Node.js startup, MCP handshake). Subsequent runs sub-second. If you see consistent 4+ second latencies on subsequent runs, the workload is scaling to zero between every call; either the traffic is naturally that bursty (acceptable) or your scaler's cooldown is too aggressive (worth tuning).
Step 8: One-line live log tail
scripts/tail-logs.sh:
#!/usr/bin/env bash
set -euo pipefail
RG="${RG:?missing RG}"
APP="${APP:-ca-mcp-prod}"
az containerapp logs show \
-g "$RG" -n "$APP" \
--container mcp \
--tail 50 --follow \
--format text
Useful in a second terminal while the smoke test runs. If you see MCP HTTP transport on :8080 (or whatever your server logs at startup), the container booted and everything after is your application speaking. If you see nothing for 30+ seconds and then Container 'mcp' was terminated with exit code 1, the container started, hit something fatal, and crashed. Logs from that startup window are the only diagnostic.
Step 9: Auto-scale verification
Generate a tiny load:
for i in $(seq 1 100); do
curl -fsS "https://$FQDN/healthz" >/dev/null &
[ $((i % 10)) -eq 0 ] && wait
done
wait
az containerapp replica list -g $RG -n ca-mcp-prod --query 'length([])' -o tsv
You should see 1 to 2 replicas if the burst was sustained, then back to the configured minReplicas after a couple of minutes. Adjust concurrentRequests in the scale rule if you want the rule more sensitive (lower) or more relaxed (higher).
A subtle point: this test verifies the scaler. It does not verify what happens to in-flight SSE sessions when scale-up arrives. To test that, hold an SSE connection open in one terminal, run a load burst from another, and verify the SSE connection survives. This is what stickySessions.affinity: 'sticky' from Step 2 is doing. Without it, the held connection breaks the moment a second replica comes up.
Step 10: Hook it up to a VS Code workspace
.vscode/mcp.json:
{
"mcpServers": {
"internal-mcp": {
"type": "sse",
"url": "https://${input:fqdn}/sse"
}
},
"inputs": [
{ "id": "fqdn", "type": "promptString", "description": "Container App FQDN" }
]
}
When VS Code prompts for the FQDN, paste the one from Step 6. Reload Copilot, type @internal-mcp in chat, you should see the available tools.
Step 11: Useful KQL queries
Once diagnostic settings are streaming, build the small set of queries that answer "is the server healthy" without opening the portal.
A tool-call latency view, useful when wiring up an Application Insights workbook:
ContainerAppConsoleLogs_CL
| where TimeGenerated > ago(1h)
| where Log_s contains "tool="
| extend tool = extract(@'tool=(\S+)', 1, Log_s)
| extend ms = toint(extract(@'duration_ms=(\d+)', 1, Log_s))
| summarize p50=percentile(ms, 50), p95=percentile(ms, 95), n=count() by tool
| order by p95 desc
This works only if your application logs lines like tool=cost_by_service duration_ms=312. Build that into your server's logging from the start; structured logs that you can extract() from KQL are the difference between "I have logs" and "I have observability".
A rate of 4xx-class responses, the alarm worth firing on:
ContainerAppConsoleLogs_CL
| where TimeGenerated > ago(15m)
| where Log_s contains "status_code="
| extend code = toint(extract(@'status_code=(\d+)', 1, Log_s))
| where code >= 400 and code < 500
| summarize n=count() by bin(TimeGenerated, 1m)
A spike here means clients are sending the wrong shape. That's either a client bug, a server-version drift, or someone running an old MCP client against the new server. All three are worth knowing about.
Production checklist
Before pointing real users at this:
Pin the image SHA, not the
:latesttag. Predictable rollouts, predictable rollbacks. The day someone pushes a broken image with the same tag is the day you regret using:latestin production.Set a CPU+memory floor that matches your cold-start budget. 0.5 CPU is fine for a typical MCP server; bump to 1.0 if your tools are CPU-bound (parsing big diffs, running KQL against gigabyte-scale workspaces).
Add a budget alarm on the resource group at $50/month. Container Apps scaled to zero is nearly free; anything above is unexpected. The alarm catches misconfiguration that left replicas pinned to one when you thought they'd scale to zero.
Confirm
internal: trueon the env.nslookupfrom outside the VNet should NOT resolve. If it does, your env is public and you're back to the start of this tutorial.Document the role assignments. That's the audit answer for "who can this server impersonate?". The
rbacModule.params.roleAssignmentsarray is the list to paste into the audit report.Pin
ManagedIdentityCredentialdirectly in production code paths.DefaultAzureCredential's walk-the-chain logic adds 1 to 2 seconds to the first request, which is unnecessary tax for a server that knows it's running with a managed identity.
Troubleshooting
Container 'mcp' was terminated with exit code 1, Container started, hit an unhandled exception, exited. Check logs (scripts/tail-logs.sh); usually a missing env var or network reach problem. Bump up the verbosity for the first few minutes after deploy until startup looks clean.
The image '<...>' could not be pulled, The Container App's identity doesn't have AcrPull on the registry, or the image tag doesn't exist. Verify with az acr repository show. If the registry is private, also check the firewall rules; the env's outbound IP needs to be allowed.
SSE drops every 4 minutes, keepalive isn't being emitted. Check the SSE transport implementation (: keepalive\n\n every 25s). Container Apps closes idle HTTP at 240 seconds. Long-lived SSE sessions look idle to the load balancer between events.
502 Bad Gateway from the FQDN, App's listening port doesn't match ingress targetPort, or the app isn't listening on 0.0.0.0. Confirm the container logs listening on 0.0.0.0:8080, not 127.0.0.1:8080. The 0.0.0.0 binding is what makes the port reachable from the env's network namespace.
Private endpoint doesn't resolve from a peered VNet, Private DNS zone isn't linked to the peered VNet. Run az network private-dns link vnet create for each peered VNet. Easy to forget when you add a new VNet later; build it into your VNet provisioning Bicep so it can't drift.
Cold start exceeds 10 seconds, Image is too large, or DefaultAzureCredential is being used. Slim the image (multi-stage build), pin to ManagedIdentityCredential directly. Setting minReplicas: 1 eliminates the cold start at the cost of always-on compute (around $7/month). Worth it for tools engineers use interactively.
Auto-scale never kicks in, The scale rule's concurrentRequests is too high for your traffic. Lower to 5 to verify the rule fires, then tune up. The scaler logs decisions to the env's diagnostic stream, query ContainerAppSystemLogs_CL for ScalerName and look for the rule firing.
What you have at this point
A private-ingress MCP server that:
- Survives the laptop closing. The original failure mode is gone.
- Is reachable from any peered VNet without internet exposure. The private DNS zone makes the FQDN feel like a normal corporate hostname.
- Scales to zero when nobody's asking, scales out when they are. Idle cost rounds to the price of a Log Analytics workspace, which is to say roughly five dollars a month.
- Has structured logs flowing to Log Analytics with two pre-built KQL queries to answer the questions that come up first.
- Can be handed to anyone with
Contributoron the resource group; the operational story is fully captured in Bicep.
The shift in posture is the part that's hard to articulate in a step list. Before, the MCP server was a private artifact owned by one person. After, it's part of the platform. Other engineers can read the Bicep, understand what it does, propose changes, deploy them, and the MCP server's evolution becomes a normal team activity rather than a "go ask whoever runs that server" question.
That shift is the actual deliverable. The 350 lines of Bicep are the means; the cultural change is the end. You now have an internal tool you can hand off, scale up, deprecate, or replace, and none of those operations require a single specific human to be available. That's the property that makes a thing a piece of platform rather than a piece of someone's laptop.
The same module set works for the next MCP server, and the next one. Across our org we now have four internal MCP servers in the same shape, deployed by the same pipeline, sharing the same Log Analytics workspace, with their permissions visible in one Bicep file each. The operational cost of running four servers is barely higher than running one, because the differentiated work was the application logic, and the rest is template.

Conversation
Reactions & commentsLiked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.