Skip to content
damionas
Topic

DevOps

Pipelines, runners, IaC, release engineering on Azure.

30 articlesBack to archive
No. 49DevOps

Migrating Classic Release Pipelines to YAML, the Six-Week Phased Plan

The Azure DevOps organisation I was asked to modernise had eighty-three Classic Release pipelines, the oldest dating to 2017.

Feb 23, 202626 min
No. 48DevOps

Service Connection Vending With Workload Identity Federation, at Org Scale

The platform team I joined had 137 Azure DevOps Service Connections in their main organisation. Of those, 48 used long-lived Service Principal client secrets that had been rotated zero times.

Feb 16, 202628 min
No. 47DevOps

Self-Hosted Azure DevOps Agents on AKS With KEDA Autoscaling

The platform team I joined had thirty Microsoft-hosted Azure DevOps agent minutes left in the month and it was the eighth.

Feb 2, 202630 min
No. 46DevOps

Multi-Stage Azure Pipelines With Bicep What-If Gates and Canary Promotion

The deploy pipeline I inherited at one customer was a single 400-line YAML that ran `az deployment group create` against production every time someone merged to main. There was no preview, no manual gate, no canary.

Jan 26, 202632 min
No. 44DevOps

Building an Azure Subscription Vending Machine With ARM Template Specs and Azure DevOps

The first time someone at our org needed a new Azure subscription, it took two weeks. A ticket got filed, a senior architect figured out the management-group placement, a Cloud Center of Excellence Owner manually created the subsc…

Jan 12, 202628 min
No. 37DevOps

Build a Bicep-Aware PR Reviewer Bot With GitHub Actions and Azure OpenAI

I review Bicep PRs. A lot of them. The team had grown from four engineers writing infrastructure to fourteen, and the rate of "did you check that this resource has diagnostic settings wired up" comments I was leaving had passed th…

Dec 4, 202527 min
No. 36DevOps

Build and Ship an Azure Cost MCP Server From Empty Folder to Container Apps in 60 Minutes

For ten months our FinOps team published a beautifully formatted daily cost email. Subscription totals, top-five movers, tag breakdowns. It linked to two dashboards. It went to forty-seven engineers.

Dec 3, 202524 min
No. 24DevOps

GitHub Actions → Azure With OIDC Federated Identity: The Setup That Survived Our SOC 2 Audit

I used to rotate Azure service principal secrets in fourteen GitHub repos every quarter. Manually. Because a teammate had been burned by an automated rotation that desynced halfway through and took production down at 3am.

Nov 26, 20259 min
No. 21DevOps

Five Gotchas When Wiring Azure DevOps MCP Server Into VS Code Copilot

The Azure DevOps MCP Server's setup docs make it look like a five-minute task. It is, if everything goes right. Most teams hit one or more of these five issues, lose an afternoon, and conclude the tool is "buggy" when really it's …

Nov 22, 20256 min
No. 34DevOps

GitHub Actions Composite Actions vs Reusable Workflows: When To Use Which (And When To Use Neither)

A team asks me this question every month: *should we put this CI logic in a composite action or a reusable workflow?* And every month I give the same five-minute answer that none of the official docs put in one place.

Nov 18, 20257 min
No. 33DevOps

Karpenter on AKS vs Cluster Autoscaler vs Node Auto-Provisioning: The Workload Where Each Wins

The "what scales nodes on AKS" question used to have one answer: Cluster Autoscaler. Now there are three: Cluster Autoscaler (CA), AKS **Node Auto-Provisioning** (NAP, which is Karpenter underneath), and self-managed **Karpenter o…

Nov 11, 20259 min
No. 18DevOps

Time-Slicing vs MIG for Bursty LLM Inference Traffic on AKS GPU Node Pools

NVIDIA gives you two ways to share a single GPU across multiple workloads on Kubernetes: time-slicing and MIG (Multi-Instance GPU). The first is software-based and flexible. The second is hardware-partitioned and rigid.

Nov 1, 202510 min
No. 31DevOps

Wiring GitHub Copilot Agent Mode + MCP Into Our Incident-Response Runbooks

Our P2 incident playbook used to be a 14-step Confluence page. It pointed at four Azure portals, three KQL queries, two PowerShell scripts, and a Slack channel.

Oct 14, 20259 min
No. 40DevOps

Replace Every Service Principal Secret With OIDC Federation: A Multi-Environment Walkthrough

I once got paged at 4am because a service principal secret expired in the middle of a release. The deploy succeeded for staging, then the production stage tried to authenticate, the SP credential had hit its 30-day TTL three minut…

Oct 8, 202524 min
No. 30DevOps

Contract-Testing an MCP Server: Fixtures, Golden Files, and the Harness That Catches Most Regressions

The MCP server we run for cost queries had a regression last quarter that nobody caught for nine days. The Cost Management API changed the shape of the `properties.rows` array (a fourth column appeared), our parser silently mapped…

Sep 30, 20258 min
No. 29DevOps

An MCP Server That Runs Bicep What-If and Detects Drift, From Inside My Editor

I review Bicep PRs. A lot of them. Half my comments before this tool were variants of "did you run what-if?", because the answer was usually no, and the diff would have caught it.

Sep 16, 20258 min
No. 39DevOps

Add Per-User OAuth and On-Behalf-Of to an Internal MCP Server

The day after we widened the audience for our internal MCP server to the broader engineering org, one of the first new users asked it to fetch cost data for a subscription they shouldn't have been able to see.

Sep 9, 202530 min
No. 28DevOps

Securing an Internal MCP Server Behind Entra ID With Per-Tool OAuth Scopes

The day after we shipped our MCP server to a wider engineering audience, somebody used it to query cost data for a subscription they shouldn't have been able to see. Not maliciously, they had MCP wired up before their RBAC was.

Sep 2, 20258 min
No. 09DevOps

Swapping ACR for Harbor in an AKS GitOps Pipeline: What Broke, What Didn't

Azure Container Registry (ACR) is the default registry for AKS workloads, and for most teams it's the right call, managed, integrated with Entra ID, geo-replicated.

Aug 30, 20259 min
No. 38DevOps

Stand Up a Production-Ready Internal MCP Server on Azure Container Apps With Workload Identity

The first version of our internal MCP server ran on a developer's laptop. It worked beautifully. Then they took a Friday off and the FinOps Slack channel filled with "MCP server unavailable" complaints by 11am, because the laptop …

Aug 26, 202528 min
No. 08DevOps

Backstage on AKS With CAPZ + ASO Instead of Crossplane: When the Tooling Choice Matters

Most Backstage-on-AKS internal-platform tutorials reach for Crossplane to do the resource provisioning. We started there too.

Aug 23, 202510 min
No. 27DevOps

Hosting MCP Servers on Azure Container Apps With Workload Identity (No Keys, No Sidecars)

The first version of our internal MCP server ran on a developer's laptop. It worked beautifully, until they took a Friday off and the FinOps Slack channel filled with "MCP server unavailable" complaints by 11am.

Aug 19, 20258 min
No. 07DevOps

Day 1 vs Day 90 on an AKS Internal Platform: What I'd Wire Differently

Three months ago I stood up a new internal developer platform on AKS for a 30-engineer team. Backstage as the portal, ArgoCD for delivery, Crossplane for resource provisioning, the usual stack.

Aug 16, 20259 min
No. 06DevOps

Migrating an AKS Cluster Off Flux v2 to the New ArgoCD Extension Without Dropping Reconciliation

When the ArgoCD extension for AKS hit GA at KubeCon Europe 2026, we had four production AKS clusters running Flux v2 GitOps and a long-standing internal preference for ArgoCD's UI for application-team developers.

Aug 9, 202511 min
No. 26DevOps

Building a Custom MCP Server for Azure Cost Insights, The 200-LOC Tool That Replaced Our Daily FinOps Email

We ran a daily FinOps email for ten months. It had cost-by-subscription, cost-by-tag, the top-five-movers list, everything finance asked for. Eight people opened it. Two of them were the FinOps team.

Aug 5, 20259 min
No. 05DevOps

Letting Copilot Agent Mode Own Our Monthly AKS Maintenance Run: Five Failure Modes I Hit

Once a month I do the same boring AKS chore: rotate certificates, prune unused resources, check node pool versions against the support matrix, and update the Helm releases for our common platform services.

Aug 2, 20258 min
No. 04DevOps

Building a Free Bicep-Aware PR Reviewer With GitHub Actions and Azure OpenAI

We had a tool gap. Our application code got AI review on every PR. Our infrastructure code, Bicep templates, Terraform modules, Helm charts, went through whatever the human on rotation was willing to look at, which was usually "th…

Jul 26, 202510 min
No. 03DevOps

What an SRE Agent Caught Last Quarter (and What It Missed)

The Azure SRE Agent has been running against our production AKS cluster for one quarter. Three months. About 90 incidents.

Jul 19, 20259 min
No. 02DevOps

30 Days With the Azure DevOps MCP Server: What Actually Changed in My Backlog Triage

I track tickets like most people: poorly. The backlog has 240 open work items in it, the average age is 71 days, and roughly a third are duplicates of each other under slightly different wording.

Jul 12, 20257 min
No. 01DevOps

Plugging Azure OpenAI Into Azure Pipelines for PR Review: A Real-World Setup

The first time we tried this, the bot left a comment on every PR that just said "Looks good!", including on a PR that introduced a hard-coded SAS token.

Jul 5, 20258 min