Cross-Repository Workflow Coordination Skill
Overview
GitHub Actions workflows in the project-beta ecosystem use self-hosted runners. When runner configurations change, ALL repositories using those runners need coordinated updates.
Architecture
matchpoint-github-runners-helm
├── Defines runnerScaleSetName: "arc-beta-runners"
└── ArgoCD deploys runners with this label
project-beta-frontend
project-beta-api } Must use: runs-on: arc-beta-runners
project-beta
Critical Rule: Workflow runs-on: MUST EXACTLY match Helm runnerScaleSetName
The Coordination Problem
Issue #121 Example
Change: Update runnerScaleSetName from arc-runners to arc-beta-runners
Impact:
matchpoint-github-runners-helm
✅ runnerScaleSetName: "arc-beta-runners"
project-beta-frontend (15 workflows)
❌ runs-on: arc-runners # OLD label - jobs stuck!
project-beta-api (13 workflows)
❌ runs-on: arc-runners # OLD label - jobs stuck!
project-beta (3 workflows)
❌ runs-on: arc-runners # OLD label - jobs stuck!
Result: All CI jobs stuck in "queued" state until workflows updated.
Affected Repositories
| Repository | Workflows | Runner Labels | Priority |
|---|
| project-beta-frontend | 15 files | arc-beta-runners | P0 - Blocks deploys |
| project-beta-api | 13 files | arc-beta-runners | P0 - Blocks deploys |
| project-beta | 3 files | arc-beta-runners | P0 - Blocks infra |
Coordination Workflow
Phase 1: Planning
Before changing runnerScaleSetName, audit all repositories:
bash
1# Search for current runner label usage
2for repo in project-beta-frontend project-beta-api project-beta; do
3 echo "=== $repo ==="
4 cd /path/to/$repo
5 grep -r "runs-on:" .github/workflows/ | grep -v "ubuntu-latest" | sort -u
6done
Output example:
=== project-beta-frontend ===
.github/workflows/ci.yaml: runs-on: arc-runners
.github/workflows/deploy.yaml: runs-on: arc-runners
...
=== project-beta-api ===
.github/workflows/test.yaml: runs-on: arc-runners
...
Document the changes needed:
- Count of files per repository
- Specific workflow files affected
- Any workflows using different labels
Phase 2: Create Migration Plan
Option A: Dual Runner Pools (Zero Downtime)
Deploy BOTH old and new runner pools during transition:
yaml
1# matchpoint-github-runners-helm/argocd/applicationset-runners.yaml
2generators:
3- list:
4 elements:
5 - name: arc-runners # OLD - for existing workflows
6 valuesFile: examples/runners-values-old.yaml
7 - name: arc-beta-runners # NEW - for updated workflows
8 valuesFile: examples/runners-values-new.yaml
Timeline:
- Deploy both runner pools
- Update workflows in all repos (can be done gradually)
- Remove old runner pool after all workflows migrated
Pros:
- Zero downtime
- Safe rollback (revert workflow changes)
- Can update repos independently
Cons:
- 2x runner costs during migration
- Need to track which repos migrated
Option B: Coordinated Single Cutover
Update runner AND all workflows simultaneously:
- Prepare PRs in ALL repositories (don't merge)
- Merge runner config change
- Wait for ArgoCD sync (~3 min)
- Merge ALL workflow PRs quickly
- Monitor for stuck jobs
Pros:
- No extra runner costs
- Clean cutover
Cons:
- ~3-5 minute CI outage
- Requires coordination across repos
- Risky if issues arise
Recommended: Option A for production, Option B for dev/test
Phase 3: Update Workflows
For each repository, create a PR that updates ALL workflow files:
bash
1# Script: update-runner-labels.sh
2#!/bin/bash
3
4OLD_LABEL="arc-runners"
5NEW_LABEL="arc-beta-runners"
6REPO=$1
7
8cd /path/to/$REPO
9
10# Find all workflow files
11WORKFLOWS=$(find .github/workflows -name "*.ya*ml")
12
13# Update each file
14for workflow in $WORKFLOWS; do
15 if grep -q "runs-on: $OLD_LABEL" "$workflow"; then
16 echo "Updating: $workflow"
17 sed -i "s/runs-on: $OLD_LABEL/runs-on: $NEW_LABEL/g" "$workflow"
18 fi
19done
20
21# Create PR
22git checkout -b fix/update-runner-label-to-$NEW_LABEL
23git add .github/workflows/
24git commit -m "ci: Update runner label from $OLD_LABEL to $NEW_LABEL
25
26Aligns with runner configuration change in matchpoint-github-runners-helm.
27
28Refs: matchpoint-ai/matchpoint-github-runners-helm#121"
29
30git push -u origin fix/update-runner-label-to-$NEW_LABEL
31
32gh pr create \
33 --title "ci: Update runner label from $OLD_LABEL to $NEW_LABEL" \
34 --body "Updates all workflows to use the new runner label \`$NEW_LABEL\`.
35
36## Context
37matchpoint-github-runners-helm changed \`runnerScaleSetName\` to \`$NEW_LABEL\`.
38
39## Changes
40- Updates all \`.github/workflows/*.yaml\` files
41- Changes \`runs-on: $OLD_LABEL\` → \`runs-on: $NEW_LABEL\`
42
43## Testing
44- [ ] Verify workflows use correct runner label
45- [ ] Confirm CI jobs execute (not stuck in queue)
46
47Related: matchpoint-ai/matchpoint-github-runners-helm#121"
Usage:
bash
1./update-runner-labels.sh project-beta-frontend
2./update-runner-labels.sh project-beta-api
3./update-runner-labels.sh project-beta
Phase 4: Verification
After merging workflow updates:
bash
1# Check that runners are picking up jobs
2gh run list --repo Matchpoint-AI/project-beta-frontend --limit 5
3
4# Verify no jobs stuck in queue
5gh run list --repo Matchpoint-AI/project-beta-frontend --status queued
6
7# Check runner status
8gh api /orgs/Matchpoint-AI/actions/runners --jq '.runners[] | {name, status, busy, labels: [.labels[].name]}'
Success criteria:
- ✅ No jobs stuck in "queued" for > 2 minutes
- ✅ Jobs transition to "in_progress" quickly
- ✅ Runners show "busy: true" when jobs running
Common Scenarios
Scenario 1: Adding New Runner Pool
Example: Add dedicated runners for frontend with GPU support
Steps:
-
Add runner pool in matchpoint-github-runners-helm:
yaml
1# argocd/applicationset-runners.yaml
2- name: arc-frontend-gpu
3 valuesFile: examples/frontend-gpu-values.yaml
-
Update ONLY affected workflows in project-beta-frontend:
yaml
1# .github/workflows/e2e-visual-tests.yaml
2jobs:
3 visual-tests:
4 runs-on: arc-frontend-gpu # NEW pool
-
Keep other workflows on existing pool:
yaml
1# .github/workflows/ci.yaml
2jobs:
3 test:
4 runs-on: arc-beta-runners # Existing pool
Impact: Only workflows explicitly updated use new pool
Scenario 2: Removing Runner Pool
Example: Deprecate arc-runners in favor of arc-beta-runners
Steps:
-
Ensure NO workflows reference old label:
bash
1for repo in project-beta-frontend project-beta-api project-beta; do
2 cd /path/to/$repo
3 grep -r "runs-on: arc-runners" .github/workflows/ && echo "❌ Found old label in $repo"
4done
-
Remove runner pool from matchpoint-github-runners-helm:
yaml
1# argocd/applicationset-runners.yaml
2# Remove the arc-runners entry
-
Verify no queued jobs after removal:
bash
1gh run list --status queued --limit 20
Scenario 3: Emergency Runner Failover
Example: Primary runner pool down, need to switch to backup
Steps:
-
Deploy backup runner pool (if not already deployed):
bash
1# Quick deploy via ArgoCD
2kubectl apply -f argocd/applications/arc-backup-runners.yaml
-
Bulk update workflows in critical repo:
bash
1# Emergency script
2find .github/workflows -name "*.yaml" -exec sed -i 's/runs-on: arc-beta-runners/runs-on: arc-backup-runners/g' {} \;
3git add .github/workflows/
4git commit -m "EMERGENCY: Switch to backup runners"
5git push
-
Monitor job execution:
bash
1watch -n 5 'gh run list --limit 10'
Validation Scripts
Pre-Merge Validation
Run before merging runner configuration changes:
bash
1#!/bin/bash
2# scripts/validate-runner-labels.sh
3
4set -euo pipefail
5
6RUNNER_LABEL=$1
7REPOS=("project-beta-frontend" "project-beta-api" "project-beta")
8
9echo "🔍 Checking if workflows use runner label: $RUNNER_LABEL"
10
11for repo in "${REPOS[@]}"; do
12 echo ""
13 echo "=== $repo ==="
14
15 if [ ! -d "../$repo" ]; then
16 echo "⚠️ Repository not found: ../$repo"
17 continue
18 fi
19
20 cd "../$repo"
21
22 MATCHES=$(grep -r "runs-on: $RUNNER_LABEL" .github/workflows/ 2>/dev/null | wc -l)
23
24 if [ "$MATCHES" -gt 0 ]; then
25 echo "✅ Found $MATCHES workflow jobs using $RUNNER_LABEL"
26 grep -r "runs-on: $RUNNER_LABEL" .github/workflows/ | head -5
27 else
28 echo "❌ No workflows use $RUNNER_LABEL"
29 fi
30
31 cd - > /dev/null
32done
Usage:
bash
1cd matchpoint-github-runners-helm
2./scripts/validate-runner-labels.sh arc-beta-runners
Post-Merge Validation
Run after merging workflow updates:
bash
1#!/bin/bash
2# scripts/verify-ci-not-stuck.sh
3
4set -euo pipefail
5
6REPOS=("Matchpoint-AI/project-beta-frontend" "Matchpoint-AI/project-beta-api" "Matchpoint-AI/project-beta")
7
8echo "🔍 Checking for stuck CI jobs..."
9
10for repo in "${REPOS[@]}"; do
11 echo ""
12 echo "=== $repo ==="
13
14 QUEUED=$(gh run list --repo "$repo" --status queued --limit 50 --json databaseId,createdAt,status | jq -r '.[] | select(.status == "queued") | "\(.databaseId) - queued since \(.createdAt)"')
15
16 if [ -z "$QUEUED" ]; then
17 echo "✅ No queued jobs"
18 else
19 echo "⚠️ Found queued jobs:"
20 echo "$QUEUED"
21
22 # Check if any queued > 5 minutes
23 STUCK=$(echo "$QUEUED" | jq -r 'select(now - (.createdAt | fromdateiso8601) > 300)')
24 if [ -n "$STUCK" ]; then
25 echo "❌ Jobs stuck for > 5 minutes!"
26 fi
27 fi
28done
Usage:
bash
1./scripts/verify-ci-not-stuck.sh
Troubleshooting
Error: Jobs Stuck After Runner Change
Symptom: CI jobs stuck in "queued" after runner label change
Diagnosis:
bash
1# Check what label runners have
2kubectl get autoscalingrunnerset -A -o jsonpath='{.items[*].spec.runnerScaleSetName}'
3
4# Check what label workflows use
5for repo in project-beta-frontend project-beta-api project-beta; do
6 cd ../$repo
7 grep -h "runs-on:" .github/workflows/* | sort -u
8done
Fix:
bash
1# If mismatch found, update workflows
2cd ../project-beta-frontend
3find .github/workflows -name "*.yaml" -exec sed -i 's/runs-on: OLD_LABEL/runs-on: NEW_LABEL/g' {} \;
4git commit -am "fix: Update runner label to match deployed runners"
5git push
Error: Some Repos Updated, Others Not
Symptom: CI works in some repos but not others
Diagnosis:
bash
1# Check each repo's workflows
2for repo in project-beta-frontend project-beta-api project-beta; do
3 echo "=== $repo ==="
4 cd ../$repo
5 grep -h "runs-on:" .github/workflows/* | sort -u
6 cd -
7done
Fix: Update remaining repos using update script
Error: Runners Deployed But Not Registering
Symptom: Runners deployed but GitHub doesn't show them
Diagnosis:
bash
1# Check GitHub runners
2gh api /orgs/Matchpoint-AI/actions/runners --jq '.runners[] | {name, labels: [.labels[].name]}'
3
4# Check Kubernetes runners
5kubectl get pods -n arc-beta-runners -l app.kubernetes.io/component=runner
Fix: See arc-runner-troubleshooting
Best Practices
- Plan multi-repo changes in advance - Don't surprise developers with stuck CI
- Use dual runner pools during migration - Eliminates downtime
- Communicate changes - Post in team chat before merging
- Verify in dev first - Test runner changes in development repo
- Monitor after deployment - Watch for queued jobs for 30 minutes post-change
- Document runner labels - Keep README updated with current label names
- Automate validation - Run validation scripts in CI for runner config changes
Coordination Checklist
Before changing runnerScaleSetName:
- #121 - releaseName/runnerScaleSetName mismatch causing empty labels
- #123 - Cross-repo label update coordination
- #112 - CI jobs stuck investigation
- project-beta-api#798 - Workflow label update
- project-beta-frontend#886 - CI blocked by label mismatch
References