KH.
CI/CD overhaul for a Series-A European fintech
2025·Financial services·6 weeks

CI/CD overhaul for a Series-A European fintech

A 12-person engineering team shipping 30+ microservices through a Jenkins monolith that took 47 minutes per run and required a senior engineer to babysit every production deploy. We rebuilt their entire CI/CD pipeline in 6 weeks — GitHub Actions for CI, ArgoCD for GitOps CD, automated smoke tests, and Slack-gated production deploys. Deploy time dropped to 4 minutes. The team went from 2–3 deploys a week to 10–15.

47 min → 4 min

Deploy time

2–3 → 10–15

Weekly deploys

3 in 6 months → 0 in 3 months

Production incidents from deploys

30 services migrated

Pipeline coverage

The challenge

The team's Jenkins setup had accumulated five years of technical debt. Build agents were shared and contended. Docker layers weren't cached between runs. Tests ran serially. The pipeline had no concept of environments — merging to main triggered a direct production deploy with no intermediate validation.

Production deploys required a senior engineer to manually SSH into the deployment server, pull the Docker image, and restart services in the right order. This happened at most twice a week, on a schedule, which meant features sat in main for days waiting for the deploy window.

Three production incidents in the previous six months were caused directly by deploy process failures — wrong image tags, missed service restarts, and one deploy that partially completed before the engineer's connection dropped.

Engineers had started batching their work to reduce the number of deploys they had to request. The feedback loop had stretched so long that context-switching during a deploy cycle was a real productivity drain.

The approach

01

Audit and baseline

I spent two days mapping the existing setup: all 30 services, their dependency graph, their current build times, and the failure modes that had caused the three production incidents. I measured baseline CI time (47 minutes average), deploy frequency (2.4/week), and lead time from merge to production (3.2 days average). These became the targets to beat.

02

GitHub Actions CI pipeline

The Jenkins pipeline ran tests serially. With GitHub Actions, I reorganised each service into a matrix job: lint, unit tests, and integration tests run in parallel. Docker builds were restructured to maximise layer cache hits — base images pulled from ECR, only the application layer rebuilt on code changes. For the services with the slowest test suites, I added pytest-xdist or Jest workers to parallelise within the test stage. Average CI time across all 30 services landed at 3m 50s.

03

ArgoCD GitOps CD

For deployment, I implemented an app-of-apps ArgoCD setup with three environments: dev (auto-sync on every merge to main), staging (manual sync with a PR-style approval), and production (manual sync with a Slack approval gate using argocd-notifications). Helm chart templates were standardised across services so deploying a new service meant filling in a 20-line values.yaml, not writing a custom pipeline.

04

Automated smoke tests and rollback

Each ArgoCD sync hook triggered a post-deploy smoke test job: HTTP health checks, database connectivity checks, and a synthetic transaction through the core payment flow. If any check failed within 5 minutes of deploy, ArgoCD rolled back to the previous revision automatically. The first time this fired in staging, it caught a misconfigured environment variable before it reached production.

05

Migration

We migrated services in three waves over three weeks — internal tooling first, then non-customer-facing services, then production traffic services. The team continued deploying via Jenkins during the transition. The cutover for each service was a 10-minute window where we ran both pipelines in parallel, confirmed the ArgoCD sync was healthy, then disabled the Jenkins job.

Results

11×

Faster deploys (47 min → 4 min)

More deploys per week

0

Deploy-caused incidents in 3 months post-launch

30

Services migrated in 3 weeks

< 5 min

Automatic rollback on smoke test failure

CI/CD overhaul for a Series-A European fintech — result screenshot

We went from dreading deploys to treating them as a non-event. The team ships multiple times a day now and nobody thinks twice about it. That's the goal.

CTO, fintech clientCTO, fintech client

Need something similar?

Every engagement starts with a 30-minute call to understand your specific situation. No pitch — just an honest conversation about what you need and whether I'm the right fit.