Skip to content
damionas
No. 47DevOpsFeb 2, 202630 min read

Self-Hosted Azure DevOps Agents on AKS With KEDA Autoscaling

The platform team I joined had thirty Microsoft-hosted Azure DevOps agent minutes left in the month and it was the eighth.

The platform team I joined had thirty Microsoft-hosted Azure DevOps agent minutes left in the month and it was the eighth. They'd hit the included quota on day five, bought the parallel-job add-on for $40 per agent per month for ten extra agents, hit that ceiling on day seven, and were now triaging which pipelines to cancel so the production deploy on day twenty-eight could still run.

The fix is self-hosting agents on AKS with KEDA-driven autoscaling. The team's Microsoft-hosted bill went from $400/month plus quota frustration to roughly $60/month of AKS compute that scales up only when there are pending jobs and back to zero between waves. The CI throughput went from "eight engineers fight for ten parallel slots" to "thirty engineers each see their PR start in under twenty seconds, and the pool sleeps overnight."

This post is the entire build. By the end you have a small AKS cluster running self-hosted Azure DevOps agents in containers, KEDA monitoring the agent pool's pending-jobs queue and scaling the deployment from zero replicas to thirty as work appears, ephemeral one-job-per-pod runners that disappear after each job, and a workload-identity story that means the agents have no PATs sitting around. About 250 lines of Helm chart + Bicep, and the operational habit of treating CI compute like any other autoscaling service.

This pattern is well-established in the Microsoft community and described in the KEDA project's official Azure Pipelines scaler post and in Lippert's writeup. What this article adds is the workload-identity-instead-of-PAT pattern, the AKS-side networking and budget hygiene, and a real walkthrough end-to-end including the Bicep.

Why this exact pattern, and not the alternatives

Brief context, because the choices have long tails.

Why self-hosted, not Microsoft-hosted. Microsoft-hosted agents are wonderful at small scale and frustrating at medium scale. The free quota covers light teams, the paid quota linearly costs more, and the agents start cold every run, which adds 30 to 60 seconds of overhead per job. At a team of thirty engineers running ten pipelines per day, self-hosting on AKS is roughly half the cost and substantially faster (warm agents, baked-in tools, in-VNet ACR pulls).

Why AKS, not VMSS or Container Apps. VMSS works (it's the original Microsoft pattern), but you manage VM images yourself and the autoscaling story is rougher. Container Apps does not yet have a clean way to run a daemon-shaped workload that polls for jobs, and the per-revision autoscale model fits HTTP traffic, not job queues. AKS plus KEDA is the well-trodden path; the KEDA azure-pipelines trigger has been GA since 2021 and the operational story is mature.

Why workload identity, not a PAT. Most published examples of this pattern store an Azure DevOps Personal Access Token as a Kubernetes secret. PATs work, expire, and need rotation, and a leaked PAT gives the attacker your full ADO permissions for the lifetime of the token. Workload identity federation between the agent's managed identity and an Azure DevOps Service Principal removes the secret entirely. The pattern is supported as of mid-2025 and the migration is straightforward.

Why ScaledJob, not ScaledObject. KEDA exposes two scalers: ScaledObject (scales a Deployment up and down) and ScaledJob (creates a Kubernetes Job per work item). For Azure DevOps agents specifically, ScaledJob is the right choice. Each ADO job runs in a fresh pod that exits when the job finishes; the next job gets a new pod. This is the ephemeral-runner pattern, which prevents bleed-over between jobs and means a compromised job pod has at most one job's worth of state.

What you'll have at the end

~/aks-azdo-agents/
├── infra/
│   ├── aks.bicep                           # the AKS cluster
│   ├── identity.bicep                      # MI + workload identity federation
│   └── acr-attach.bicep                    # AKS pull from ACR
├── images/
│   ├── Dockerfile                          # ADO agent image
│   └── start.sh                            # entrypoint script
├── helm/
│   └── azdo-agents/
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
│           ├── serviceaccount.yaml
│           ├── scaledjob.yaml
│           └── trigger-auth.yaml
├── pipelines/
│   └── self-hosted-test.yml                # demo pipeline
└── README.md

Prerequisites

az --version            # 2.65 or newer
kubectl version --client
helm version
az bicep version        # 0.30 or newer

You'll also need Project Collection Administrator on the Azure DevOps org for the one-time agent-pool service-principal grant in step 4. After that, the platform team's day-to-day RBAC can be lower.

Step 1: The agent container image

The official Microsoft Azure Pipelines agent image is at mcr.microsoft.com/azure-pipelines/vsts-agent. We extend it slightly to bake in tools your jobs use frequently (az, kubectl, helm, terraform, etc.) so cold starts don't include a multi-minute install.

images/Dockerfile:

FROM mcr.microsoft.com/azure-pipelines/vsts-agent:ubuntu-22.04

USER root

# Standard CI tools we don't want to install per-job
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl jq git docker.io \
  && rm -rf /var/lib/apt/lists/*

# Azure CLI
RUN curl -sL https://aka.ms/InstallAzureCLIDeb | bash

# kubectl
RUN curl -L "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" -o /usr/local/bin/kubectl \
  && chmod +x /usr/local/bin/kubectl

# Helm
RUN curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Bicep
RUN curl -Lo bicep https://github.com/Azure/bicep/releases/latest/download/bicep-linux-x64 \
  && chmod +x ./bicep && mv ./bicep /usr/local/bin/bicep

# Run agent as a non-root user (security default Defender for Cloud will flag otherwise)
RUN useradd -m -u 1000 agent
USER agent
WORKDIR /home/agent

COPY --chown=agent:agent start.sh /home/agent/start.sh
RUN chmod +x /home/agent/start.sh

ENTRYPOINT ["/home/agent/start.sh"]

A few comments on what each block is doing:

  • FROM mcr.microsoft.com/azure-pipelines/vsts-agent is Microsoft's officially supported base. Stay on this rather than rolling your own; updates flow from Microsoft's CI.
  • The apt-get install block adds docker.io because some jobs do docker build. If your team doesn't run docker builds inside the agent, drop it; the image gets ~150MB smaller.
  • useradd -m -u 1000 agent plus USER agent runs the agent as a non-root user, which Defender for Cloud will flag as a finding if you skip. Cheap to fix once.
  • start.sh is where the auth and registration logic lives, which we'll cover in step 3.

Build and push:

ACR=myacr
az acr login --name $ACR
docker build -t $ACR.azurecr.io/azdo-agent:1.0.0 ./images
docker push $ACR.azurecr.io/azdo-agent:1.0.0

Pin a real version tag, not latest. Pipeline reproducibility depends on knowing which agent image ran which job.

Step 2: The agent registration script

images/start.sh:

#!/usr/bin/env bash
# Register this pod as an Azure DevOps agent, run a single job, exit.
# When called by KEDA's ScaledJob, each pod runs exactly one job.

set -euo pipefail

AZP_URL="${AZP_URL:?required: Azure DevOps org URL, e.g. https://dev.azure.com/myorg}"
AZP_POOL="${AZP_POOL:?required: agent pool name, e.g. aks-pool}"
AZP_AGENT_NAME="${AZP_AGENT_NAME:-$(hostname)}"

# Auth: use a federated workload-identity token, exchanged for an ADO access token.
# The agent's managed identity is mapped to a service principal via Workload Identity
# Federation, which has access to the Azure DevOps org. No PAT needed.

# Acquire a federated token from the AKS workload identity webhook
echo "Acquiring federated token from workload identity webhook..."
FED_TOKEN=$(cat /var/run/secrets/azure/tokens/azure-identity-token)

# Exchange for an Entra ID token scoped to Azure DevOps
ENTRA_TOKEN=$(curl -sS -X POST \
  "https://login.microsoftonline.com/${AZURE_TENANT_ID}/oauth2/v2.0/token" \
  -d "client_id=${AZURE_CLIENT_ID}" \
  -d "scope=499b84ac-1321-427f-aa17-267ca6975798/.default" \
  -d "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer" \
  -d "client_assertion=${FED_TOKEN}" \
  -d "grant_type=client_credentials" \
  | jq -r '.access_token')

if [ -z "$ENTRA_TOKEN" ] || [ "$ENTRA_TOKEN" = "null" ]; then
  echo "Failed to acquire Entra token. Check AZURE_TENANT_ID, AZURE_CLIENT_ID, federated credential."
  exit 1
fi

# Configure the agent against the pool.
# AGENT_TOKEN is the env var the agent expects when --auth pat is passed.
# We pass the Entra token as if it were a PAT, which the ADO agent supports.
./config.sh \
  --unattended \
  --url "$AZP_URL" \
  --pool "$AZP_POOL" \
  --agent "$AZP_AGENT_NAME" \
  --auth pat \
  --token "$ENTRA_TOKEN" \
  --replace \
  --acceptTeeEula

# Run one job and exit. The --once flag makes the agent disappear after one job,
# which is the ephemeral-runner pattern we want.
trap "./config.sh remove --unattended --auth pat --token \"$ENTRA_TOKEN\" || true" EXIT
./run.sh --once

Three things this script does that the published examples typically don't:

  1. Reads the federated token from the workload identity webhook, then exchanges it for an Entra ID token scoped to Azure DevOps (499b84ac-1321-427f-aa17-267ca6975798 is the canonical ADO resource GUID). No PAT, no rotation, no secret in the cluster.
  2. Uses --once so the agent runs exactly one job, then run.sh exits with code 0. Combined with KEDA's ScaledJob, every job runs in a brand-new pod that doesn't carry state from previous jobs.
  3. Uses an EXIT trap to deregister the agent before the pod terminates, which keeps the ADO pool clean. Without this, the pool fills up with offline agents over time and queries against agent count get noisy.

Step 3: AKS cluster + workload identity

infra/aks.bicep:

param location string = resourceGroup().location
param clusterName string = 'aks-azdo-agents'
param nodeCount int = 1
param nodeVmSize string = 'Standard_D4s_v5'

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: clusterName
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    dnsPrefix: clusterName
    kubernetesVersion: '1.30'

    // Workload Identity + OIDC issuer are the federation prerequisites
    oidcIssuerProfile: { enabled: true }
    securityProfile: {
      workloadIdentity: { enabled: true }
    }

    agentPoolProfiles: [
      {
        name: 'system'
        count: nodeCount
        vmSize: nodeVmSize
        mode: 'System'
        osType: 'Linux'
        enableAutoScaling: true
        minCount: 1
        maxCount: 5
      }
    ]

    networkProfile: {
      networkPlugin: 'azure'
      loadBalancerSku: 'standard'
    }

    // Add KEDA as an AKS-managed addon (since 2024 it's first-party)
    workloadAutoScalerProfile: {
      keda: { enabled: true }
    }
  }
}

output oidcIssuerUrl string = aks.properties.oidcIssuerProfile.issuerURL
output clusterName string = aks.name

The oidcIssuerProfile, workloadIdentity, and keda blocks are the three new-since-2023 features that turn this from a hand-rolled mess into a one-Bicep deploy. Without them, you'd be installing KEDA via Helm yourself and figuring out OIDC by hand. With them, AKS owns those concerns and you just consume the result.

Standard_D4s_v5 is opinion. 4 vCPU/16GB nodes are right for typical CI workloads (docker build, terraform plan, dotnet test). If your jobs are mostly lightweight (lint, format, type-check), drop to D2s_v5 and save half the cost. If your jobs run integration tests with Postgres in-pod, go to D8s_v5. The Bicep parameter makes this a single-character change.

Step 4: The federated identity

infra/identity.bicep:

extension microsoftGraphV1
param appDisplayName string = 'sp-azdo-agents'
param oidcIssuerUrl string
param namespace string = 'azdo-agents'
param serviceAccountName string = 'azdo-agent-sa'

resource app 'Microsoft.Graph/applications@v1.0' = {
  uniqueName: appDisplayName
  displayName: appDisplayName
}

resource sp 'Microsoft.Graph/servicePrincipals@v1.0' = {
  appId: app.appId
}

// Federated credential: when the K8s service account presents an OIDC token,
// trust it as if it were this Entra app
resource fedCred 'Microsoft.Graph/applications/federatedIdentityCredentials@v1.0' = {
  parent: app
  name: 'k8s-sa-azdo-agents'
  properties: {
    issuer: oidcIssuerUrl
    subject: 'system:serviceaccount:${namespace}:${serviceAccountName}'
    audiences: [ 'api://AzureADTokenExchange' ]
  }
}

output appId string = app.appId
output servicePrincipalObjectId string = sp.id

The subject claim is the Kubernetes service account in <namespace>:<sa-name> form. When the pod authenticates, the OIDC token's subject matches this, and the federation lets the SP through.

After the federated identity is provisioned, grant it access to your Azure DevOps organisation. This is the one-time Project Collection Administrator step:

  1. In Azure DevOps → Organization Settings → Users → Add user.
  2. Add the SP using the email format: <appId>@<tenantId>.onmicrosoft.com.
  3. Set access level to "Basic" and assign it to the team containing the agent pool.
  4. In Agent Pools → aks-pool → Security → Add the SP with Administrator role.

Without that last step the agent registration call returns a 401 with a confusing message. The SP must have explicit pool admin rights to register agents.

Step 5: The Helm chart

helm/azdo-agents/templates/serviceaccount.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: {{ .Values.serviceAccountName }}
  namespace: {{ .Values.namespace }}
  annotations:
    azure.workload.identity/client-id: {{ .Values.azureClientId | quote }}
  labels:
    azure.workload.identity/use: "true"

The annotations and labels are what bind the Kubernetes service account to the Entra ID app's federated credential. Without them, the workload identity webhook does nothing and the pod's identity falls back to the kubelet's, which doesn't have ADO access.

helm/azdo-agents/templates/trigger-auth.yaml:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: azdo-trigger-auth
  namespace: {{ .Values.namespace }}
spec:
  podIdentity:
    provider: azure-workload
    identityId: {{ .Values.azureClientId | quote }}

KEDA needs to authenticate to Azure DevOps to query pool length. The TriggerAuthentication resource tells KEDA to use workload identity (the same identity the agent pods use). No PAT in the trigger config either.

helm/azdo-agents/templates/scaledjob.yaml:

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: azdo-agent-scaledjob
  namespace: {{ .Values.namespace }}
spec:
  jobTargetRef:
    template:
      spec:
        serviceAccountName: {{ .Values.serviceAccountName }}
        containers:
          - name: agent
            image: {{ .Values.image }}
            env:
              - name: AZP_URL
                value: {{ .Values.azpUrl | quote }}
              - name: AZP_POOL
                value: {{ .Values.azpPool | quote }}
              - name: AZURE_TENANT_ID
                value: {{ .Values.azureTenantId | quote }}
              - name: AZURE_CLIENT_ID
                value: {{ .Values.azureClientId | quote }}
            resources:
              requests:
                cpu: "1"
                memory: "2Gi"
              limits:
                cpu: "4"
                memory: "8Gi"
        restartPolicy: Never
    backoffLimit: 0     # Don't retry agent runs; KEDA will spawn another job

  pollingInterval: 30   # Check pool every 30 seconds
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  maxReplicaCount: 30   # Hard ceiling on simultaneous agents
  minReplicaCount: 0    # Scale to zero when idle, this is the cost win

  triggers:
    - type: azure-pipelines
      metadata:
        organizationURLFromEnv: AZP_URL
        poolName: {{ .Values.azpPool | quote }}
      authenticationRef:
        name: azdo-trigger-auth

The shape of ScaledJob matters. Each pending job in the pool causes KEDA to create a fresh Kubernetes Job, which spawns one pod, which registers as an agent, runs the queued job, deregisters, and exits. restartPolicy: Never plus backoffLimit: 0 ensures a failed run doesn't retry; if a job fails, the pipeline retries at the ADO level by re-queueing.

The maxReplicaCount: 30 is a deliberate ceiling. Without it, a flood of pull requests can spawn hundreds of agents and cost-spike. Pick a number that matches your AKS node-pool capacity plus a margin. If 30 agents won't fit on your nodes, AKS cluster autoscaler scales the node pool up; that's fine, but be aware your nodepool.maxCount from step 3 must accommodate.

helm/azdo-agents/values.yaml:

namespace: azdo-agents
serviceAccountName: azdo-agent-sa
image: myacr.azurecr.io/azdo-agent:1.0.0
azpUrl: https://dev.azure.com/myorg
azpPool: aks-pool
azureTenantId: <your-tenant-id>
azureClientId: <federated-app-clientid>

Step 6: Deploy

Step-by-step:

# 1. Deploy AKS + identity (one-time; idempotent on re-run)
RG=rg-azdo-agents
az group create -n $RG -l eastus
az deployment group create -g $RG \
  --template-file infra/aks.bicep

OIDC=$(az aks show -g $RG -n aks-azdo-agents --query oidcIssuerProfile.issuerURL -o tsv)
echo "OIDC issuer: $OIDC"

az deployment group create -g $RG \
  --template-file infra/identity.bicep \
  --parameters oidcIssuerUrl="$OIDC"

APP_ID=$(az ad app list --display-name sp-azdo-agents --query '[0].appId' -o tsv)

# 2. Create the namespace and grab kubectl creds
az aks get-credentials -g $RG -n aks-azdo-agents
kubectl create namespace azdo-agents

# 3. Install the chart
helm install azdo-agents ./helm/azdo-agents \
  --namespace azdo-agents \
  --set azureClientId="$APP_ID" \
  --set azureTenantId="$(az account show --query tenantId -o tsv)" \
  --set azpUrl="https://dev.azure.com/myorg" \
  --set image="myacr.azurecr.io/azdo-agent:1.0.0"

Verify the chart deployed cleanly:

kubectl -n azdo-agents get scaledjob,trigger-authentication,sa
# Should show one ScaledJob, one TriggerAuthentication, one ServiceAccount

kubectl -n keda-system logs -l app=keda-operator | grep azdo
# Should see "trigger.azure-pipelines" registered for your ScaledJob

Step 7: Trigger a job and watch it scale

pipelines/self-hosted-test.yml:

trigger: none
pool:
  name: aks-pool

steps:
  - bash: |
      echo "Hello from $(hostname)"
      echo "Agent OS: $(uname -a)"
      echo "Tools available:"
      which az kubectl helm bicep terraform jq
      echo "Sleeping 60s to keep the agent alive long enough to observe."
      sleep 60

Save it in your repo, run it from the Azure DevOps Pipelines UI:

# Watch the cluster respond
kubectl -n azdo-agents get pods -w

Within ~30 seconds (KEDA's polling interval) a new pod appears, registers as an agent, runs the 60-second sleep, deregisters, and exits. The pod cleans itself up; ScaledJob retains 5 historical successes and 5 failures for debugging.

To verify the autoscaling works, queue 10 instances of the test pipeline simultaneously. KEDA will spawn 10 pods, AKS may need to scale node count up if 10 pods exceed cluster capacity, all 10 jobs run in parallel, and the cluster scales back to zero (well, one system pod) within 5 minutes of the last job finishing.

Production checklist

  1. Pin a node-pool autoscaler maxCount to match your maxReplicaCount. A KEDA ceiling of 30 with a node-pool max of 5 means jobs will queue beyond capacity. Compute: 30 pods * 1 vCPU req = 30 vCPU; on Standard_D4s_v5 (4 vCPU each) that's 8 nodes minimum. Set maxCount: 10 on the user node pool and the math works out.

  2. Use a separate node pool for agents. A "system" pool for KEDA + cluster controllers, a "user" pool for agents. Taint the user pool with workload=ci:NoSchedule and matching toleration on the agent pod spec. Prevents agent pods from competing with platform pods for resources.

  3. Egress through a NAT gateway. Agents pulling code, packages, container layers, and pushing to ACR generate substantial outbound traffic. Without NAT gateway, this leaves through the AKS load balancer and incurs SNAT exhaustion at moderate scale. NAT gateway is the right answer; it's about $35/month plus egress.

  4. Image-update cadence. Microsoft updates the base agent image roughly monthly. Pin a version, update on a deliberate cadence, run the test pipeline before promoting. Don't track latest.

  5. Secret-scan the agent. A self-hosted agent that runs untrusted code is a possible exfiltration path. Wire a CI step that runs git-leaks against every job, or restrict the agent's outbound network to known allowlist destinations.

  6. Set Azure budget alarms on the cluster's resource group. A misbehaving scaler that doesn't scale down (KEDA bug, ADO pool-API outage) can silently rack up compute. The budget tells you within hours; ad-hoc cost review tells you when the bill arrives.

Troubleshooting

Pods come up but stay in Pending is almost always missing image-pull credentials. Confirm with kubectl describe pod that there's no ErrImagePull. The fix is az aks update -g <rg> -n <cluster> --attach-acr <acr-name> to grant kubelet AcrPull.

Agent registers, runs the job, but the next pod fails to register means the agent name collided. The start.sh script uses hostname as the agent name, which is the pod name (unique per pod), so this should not happen unless you've overridden AZP_AGENT_NAME to something static.

KEDA scaler returns "TF400813: Resource not available for anonymous access" means the federated SP doesn't have access to the agent pool. Re-check step 4's "Pool admin" assignment. The error is misleading; it implies anonymous when it really means missing-pool-rights.

ScaledJob creates pods but they immediately CrashLoopBackOff is usually start.sh failing to acquire the federated token. Confirm the workload identity webhook is installed (kubectl get mutatingwebhookconfiguration azure-wi-webhook) and that the service account has the right annotations and label.

KEDA polling-interval is 30s but jobs feel slow to start is the KEDA → pool-API → KEDA cycle plus the node provisioning time. To get faster cold starts, set minReplicaCount: 1 on the ScaledJob; the first agent is always warm at the cost of one always-on pod (~$15/month).

Agent stays online in the pool after pod exits means the EXIT trap in start.sh failed. Usually because the Entra token expired during a long job. Refresh strategy: re-acquire the token in the trap before calling config.sh remove.

Real-world references

The Lippert and Xebia posts will look familiar; this article's structure follows the same shape but swaps PAT auth for workload identity, which the older posts predate.

What this gives you, beyond the bill cut

The obvious win is the cost: roughly half the monthly Azure DevOps spend on a 30-engineer team. That alone is the business case.

The less obvious wins compound over time. Cold start drops from 30 to 60 seconds (Microsoft-hosted) to under 10 seconds (warm AKS agent), which means PR checks complete faster, which means engineers iterate faster. Tools are baked into the image, so a job that used to spend 90 seconds running apt-get install now runs in 0 seconds because everything is already there. Network paths are intra-VNet, so ACR pulls are gigabit-fast instead of internet-egress-throttled.

The cultural shift is the part most platform teams underestimate. CI used to be a thing the team complained about: slow, flaky, queued. After self-hosting on AKS, CI becomes a thing the team uses without thinking about. The pipeline runs, it finishes, the next one starts. You stop hearing about it, which is the highest praise infrastructure ever gets.

Six months in, a team I shipped this for is running about 3,400 ADO jobs per month on AKS. The cluster averages 1.2 agent replicas with bursts to ~12 during peak hours, scales to zero overnight. Cost is consistent at $58/month. The Microsoft-hosted bill that this replaced was $420/month. The platform team's CI-related Slack messages went from "anyone know why my PR check is queued" to nothing — which is exactly the goal.

AKSKEDAAzure DevOps AgentsWorkload Identity

Conversation

Reactions & comments

Liked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.

Loading conversation…

More from DevOps

See all →