Case study

DevOps at scale on critical infrastructure — GitLab CI forge for 450+ apps & Chaos Engineering GameDays at Enedis (via Klanik)

At Enedis, France's primary electricity distribution operator, I helped build from scratch a Kubernetes-based GitLab CI forge serving 450+ applications, then co-designed large-scale Chaos Engineering GameDays simulating DDoS, database corruption and secret leaks. The common goal: make teams anticipate failure instead of reacting to it.

Enedis · via Klanik 8 min

GitLab CI
Kubernetes
HashiCorp Vault
Sonatype Nexus
SonarQube
Terraform
EKS
Grafana
Chaos Engineering
Discord

TL;DR

At Enedis, France’s primary electricity distribution operator, I helped build from scratch a Kubernetes-based GitLab CI forge serving 450+ applications, then co-designed large-scale Chaos Engineering GameDays simulating DDoS, database corruption and secret leaks. The common goal: make teams anticipate failure instead of reacting to it.

Context

Enedis operates ~95% of France’s electricity distribution grid. This is critical national infrastructure: reliability, traceability, and security are not “best practices” — they are operational requirements.

When I arrived (mission via Klanik, Jan 2022), there was no centralized CI/CD forge. Teams had heterogeneous pipelines, scattered tooling, and uneven security practices. Some projects had decent automation, others were closer to handcrafted scripts. There was no single entry point, no standard templates, no unified way to handle secrets, artifacts, scans, or runners at scale.

At the same time, hundreds of applications (≈450 GitLab projects) and a very large developer population needed CI/CD every day.

Our DevOps team (5 people) was tasked with an ambitious objective:

Build a single, industrialized CI/CD forge for the entire company, on Kubernetes, that could scale up and down automatically, and make “good DevOps” the default for everyone.

Later in the mission, I also co-built a Chaos Engineering platform used for large GameDays involving 100+ participants, designed not to “break things for fun” but to educate teams on observability, monitoring, and anticipating production failures.

These two storylines share the same philosophy: platforms and exercises as tools to change engineering behavior at scale.

Constraints

Critical infrastructure context → we could not afford instability in the forge itself
High security requirements → strict secrets handling, auditability, traceability
450+ applications, many teams, many languages and stacks
Need for auto-scaling runners → cost control + performance
Need to integrate existing tools rather than replace everything
For GameDays: 100+ participants, multi-team coordination, realistic scenarios on real clusters

Decision — Industrializing the GitLab CI forge

We decided to make GitLab the single CI/CD entry point for the company, backed by a Kubernetes platform running:

Auto-scaled GitLab runners (HPA)
Centralized secrets with HashiCorp Vault
Artifact repository with Sonatype Nexus
Code quality & security scans via SonarQube
Infrastructure as Code deployments with Terraform

The key was not just hosting runners. The key was templates.

We built 40+ modular CI/CD templates covering:

Multiple languages and frameworks
Build, test, Docker build, push, release
Security scans by default
Terraform plan/apply pipelines
Standardized deployment patterns

Any new project at Enedis could onboard and instantly get a full DevOps lifecycle by including a few lines in .gitlab-ci.yml.

This turned CI/CD from a per-team responsibility into a shared platform capability.

Template architecture (core idea)

Heavy use of include
Versioned templates
Composable blocks (build, test, scan, docker, deploy)
Opinionated defaults, overridable when needed

Teams didn’t have to reinvent pipelines. They consumed building blocks.

Decision — Chaos Engineering as pedagogy

Separately, an internal entity (NIDC) wanted to run large GameDays.

With another DevOps engineer, I built a fully automated, reproducible chaos platform:

Real Kubernetes clusters created for the event
Entire infra provisioned with Terraform
Automated disaster scenarios:
- DDoS
- Database corruption
- Secret leaks
Per-team observability spaces with Grafana Labs dashboards
Discord automation: one channel per team, event orchestration, instructions, monitoring

The goal was competitive and educational: teams had to detect, understand, and fix issues as fast as possible.

This wasn’t about breaking infra. It was about teaching teams:

Logs, metrics, monitoring, and tests in production are not optional.

What I built

The forge platform (high level)

Developers → GitLab CI → K8s runners (HPA)
                         ↓
                 Vault / Nexus / Sonar
                         ↓
                 Docker registry / Terraform / Deploy

Runners scaled automatically depending on load
Secrets injected dynamically via Vault
Artifacts stored in Nexus
Security scans everywhere by default
Forge itself designed to scale down when idle

We even designed an automated DRP for the forge itself.

The GameDay platform

Terraform → EKS clusters → Injected failures
                          ↓
                     Grafana dashboards
                          ↓
                     Discord per team

Everything was ready before the event. On the day, we only had to “press start”.

Outcome

Forge

450+ applications onboarded
Standardized CI/CD across the company
Security scans by default everywhere
Massive improvement in developer onboarding
Reliable, auto-scaled runner platform
A forge that teams trusted and adopted

Chaos GameDays

100+ participants
Teams discovering gaps in monitoring and alerting they didn’t know they had
Clear behavioral shift between first and later GameDays:
- Teams came better prepared
- Monitoring was taken seriously afterward
Educational impact far beyond the event itself

What I’d do differently

Invest earlier in documentation for templates (adoption would have been even faster)
Productize the GameDay platform as a reusable internal product
Measure more formal metrics on developer experience improvements

All case studies

Keep reading

A role to fill, or just a conversation? Let’s talk.

Book a 30-min intro Email me