Azure Container Registry (ACR) is the default registry for AKS workloads, and for most teams it's the right call — managed, integrated with Entra ID, geo-replicated. We ran on ACR for two years before deliberately migrating to a self-hosted Harbor instance.
I want to be honest: we didn't migrate because Harbor was technically superior. We migrated because we wanted vulnerability scanning across multiple registries (ACR, GHCR, Docker Hub) with a single policy engine, and Harbor's project-scoped vulnerability gates fit that need better than Microsoft Defender for Cloud's container scanning at the time.
If you're considering a similar migration — or just curious whether the ACR-to-Harbor path is worth it for your team — here's the honest report on what changed.
The migration shape
Our setup before:
- ACR
crprodeus.azurecr.iowith ~140 images - AKS pulls via the AKS-attached ACR integration (no PAT, no secret)
- GitHub Actions builds and pushes via federated credentials
- Microsoft Defender for Cloud scans on push
Our setup after:
- Harbor running on AKS in its own namespace, behind an internal load balancer
- AKS pulls from
harbor.platform.internal/<project>/<image> - GitHub Actions builds and pushes via OIDC-issued robot account tokens
- Trivy scanner running in Harbor with project-level vulnerability gates
The migration ran for six weeks. Three of those weeks were the actual move. The other three were the things nobody warns you about.
Week 1-2: standing up Harbor properly
Harbor on AKS has a Helm chart that gets you 80% of the way there. The other 20% is everything that breaks when Harbor itself goes down.
Specifically: Harbor stores its registry data in object storage. We pointed it at an Azure Storage account. The storage account needed to be private-endpoint-only (because we don't expose ports to the internet for things storing image layers). The private endpoint needed VNET peering to Harbor's namespace network. Then Harbor's database (PostgreSQL) needed similar treatment.
The Harbor HA story is also more your problem than the ACR HA story. If your AKS cluster goes down, Harbor goes down with it. We mitigated by running Harbor on a separate node pool with explicit pod anti-affinity, plus geo-replication of the storage account. Not as robust as ACR's geo-replication. Adequate for our needs.
Week 3: the registry migration itself
Migrating images from ACR to Harbor is a regctl (or crane) loop:
for repo in $(az acr repository list -n crprodeus -o tsv); do
for tag in $(az acr repository show-tags -n crprodeus --repository $repo -o tsv); do
crane copy crprodeus.azurecr.io/$repo:$tag harbor.platform.internal/migrated/$repo:$tag
done
done
About 6 hours of runtime for our 140 images and ~2,400 tags. We ran this in a CronJob in AKS so the bandwidth stayed in-region.
Validation step: we kept ACR live for two weeks after migration. Both registries had identical images. Pulls went to Harbor; if anything was missing from Harbor, the image pull would fail loudly rather than silently fall back to ACR. (We did NOT configure registry mirrors. Mirrors hide problems.)
Week 4: workload cutover
Updating image references in YAML across all our Helm charts and ArgoCD applications was the boring part — about 80 files touched, mostly find-and-replace. The interesting part was authentication.
ACR's AKS integration meant the kubelet handled auth invisibly. Harbor needed an imagePullSecret per namespace. We standardized on a single secret per namespace with a robot account that had read-only pull permissions to the relevant Harbor projects. Created and managed by ASO + a custom controller.
This is where most teams stumble. The secret rotation story for image-pull secrets is unglamorous and easy to get wrong. We rotate ours every 90 days via a CronJob that mints a new robot token, updates the secret, and re-rolls the affected workloads. About 50 lines of bash + kubectl. Boring, works.
Week 5-6: the things nobody warned us about
Pull-through latency under load. Harbor running on the same cluster as the workloads pulling from it: when the cluster gets busy, Harbor gets slow, which makes pulls slow, which slows down auto-scaling. We saw a 4-minute new-pod-startup time during a traffic spike that should have been 30 seconds. The workloads couldn't scale fast enough because they couldn't pull images fast enough.
The fix was running Harbor on a dedicated node pool with reserved capacity. Not glamorous. Necessary.
Build-time pushes choking on the network policy. Our GitHub Actions self-hosted runners (also on AKS) push to Harbor. Initially the network policy between the runners namespace and Harbor's namespace was too tight, and pushes timed out about 5% of the time on large images. Increased the connection timeout, opened the right ports, problem solved.
Defender for Cloud's container scanning still ran on every push to ACR. Because we kept ACR active for two weeks during the cutover, Defender kept scanning. We were paying for two scanners. Trivial cost in our case (~$30 over two weeks), but worth knowing.
What got better
Multi-registry vulnerability gates. Harbor's project-level "block on critical vulnerabilities" policy works the same way regardless of where the image came from — Harbor scans on push and on pull. We now enforce the same security gate for ACR-built, GHCR-built, and Docker Hub mirror images. Couldn't do that before.
Image signing with cosign was easier. Harbor 2.x has native cosign integration via its replication policies. We turned on signature requirement for the prod project. Unsigned images don't get pulled. Setting the same up across ACR + a separate signing infrastructure would have been more complex.
Cost visibility per-team. Harbor projects map cleanly to teams. We can see exactly how much storage each team's images consume. ACR has this via tags but it's noisier.
What got worse
Operational burden. ACR was a managed service. Harbor is our problem. Patches, certs, storage capacity planning, HA — all on us. We added about 4-6 hours/month of platform-team toil for Harbor that didn't exist for ACR.
First-time pull latency from new regions. ACR with geo-replication serves pulls from the closest replica. Harbor on a single region serves all pulls. For a workload running in West Europe pulling from a Harbor in East US, that's measurable latency on first pull (subsequent pulls are cached at the node level). We mitigated by putting Harbor in our most-used region and accepting the trade-off.
The honest verdict
For most teams, ACR is the right choice. Don't migrate to Harbor unless you have a specific need that ACR can't meet — multi-registry vulnerability scanning, complex replication topologies, or a hard requirement to self-host the registry layer.
Our specific need was the first one. Harbor solved it. We pay the operational cost. Net positive for us; would not have been net positive for the team next door who didn't have the multi-registry need.
What I'd do differently
Provision Harbor on a separate AKS cluster from the workload cluster, not the same cluster. The "Harbor and the workloads compete for resources" problem would have been avoided entirely with that separation. We co-located for cost reasons. The cost savings ended up being eaten by the dedicated node pool we had to add anyway.
I would NOT skip the dual-registry validation period. Two weeks of paying for both was worth the peace of mind that the migration was clean.

Conversation
Reactions & commentsLiked this? Tap a reaction. Want to push back, share a war story, or ask a follow-up? Drop a comment below — replies are threaded and markdown works.