Case study

Bedrock load-testing platform — 10× capacity at 90% lower cost

Rebuilt Bedrock's load-testing platform on a hybrid Amazon EC2 + ECS architecture integrated with Gatling Enterprise Cloud. Result: ~10× more capacity at ~90% lower cost — by treating FinOps as an architecture problem, not a finance ticket.

Bedrock Streaming 7 min

Amazon EC2
Amazon ECS
Gatling Enterprise Cloud
AWS
FinOps
performance engineering
load testing

TL;DR

I rebuilt Bedrock’s load-testing platform on a hybrid Amazon EC2 + Amazon ECS architecture integrated with Gatling Enterprise Cloud. Result: ~10× more load capacity at ~90% lower cost — by treating FinOps as an architecture problem, not a finance ticket.

Context

At Bedrock Streaming, load testing is not optional. With 20M+ weekly viewers and dozens of product teams shipping continuously, performance validation is part of the delivery culture. Every team needs to simulate real traffic patterns against real services, often at very high concurrency.

Before I touched the platform, load testing lived in two worlds:

Individual teams running scripts (sometimes Postman collections) on their own
A centralized setup based on Gatling Cloud, introduced by the DevOps team before I arrived

The intention was good: provide a managed, scalable load-testing solution. The problem was the hosting model.

Gatling Cloud provisions ephemeral machines billed by the minute, with relatively small instance templates. There was no real distinction between “number of users simulated” and “number of machines required.” If you wanted to simulate more users, you had to start more machines.

Two technical limits amplified the problem:

Performance limits of small instances
Linux ephemeral port limits per IP (≈64k), which becomes a hard ceiling when you try to simulate large numbers of concurrent users from a single machine

To simulate realistic user traffic at Bedrock’s scale, teams had to spawn many machines per test. Costs scaled linearly with ambition.

As more teams started doing serious performance testing, the bill started exploding.

Constraints

Several constraints shaped the redesign:

AWS-native infrastructure was the norm at Bedrock
50+ teams needed self-service load testing
The solution had to remain compatible with Gatling Cloud (tool choice predated me)
Tests had to be cheap enough that teams would not hesitate to run them often
Very high concurrency targets (hundreds of thousands to millions of virtual users)
No appetite for a long migration or tool change — this had to be an architectural optimization, not a cultural one

Trigger

There was no incident.

No management mandate.

This was a proactive initiative I pushed because the trajectory was obvious: more teams → more tests → exponential cost growth.

I framed it as: if we don’t change the architecture now, we will soon have to tell teams to test less. That is the opposite of what you want in a streaming platform.

Decision — Why hybrid EC2 + ECS

Because Gatling Cloud already relied on ECS + EC2 under the hood, the solution had to stay compatible with that model.

The key shift was conceptual:

Stop renting many small machines from Gatling Cloud. Start running a few very large machines inside our own AWS account.

I designed a hybrid model:

A control plane running on Amazon ECS
Dedicated, large EC2 instances acting as load generators, registered into Gatling Cloud

Instead of Gatling provisioning ephemeral workers for us, we provided our own workers — much bigger, much fewer, much cheaper.

This solved both problems:

Port/IP limits: large instances with multiple ENIs and IPs
Performance limits: far more CPU and memory per worker

And most importantly: cost.

What I built

High-level architecture

CI / Manual trigger
        │
        ▼
   ECS Control Plane
        │
        ▼
Large EC2 Load Generators (self-managed)
        │
        ▼
Gatling Cloud Orchestration
        │
        ▼
Targets inside Bedrock VPC

The idea

Gatling Cloud allows external load generators to connect to it. Instead of letting Gatling spin small instances billed by the minute, I created a pool of very large EC2 instances in our infrastructure that would attach to Gatling as workers.

Fewer machines, but far more powerful.

Why this changes everything

Previously:

To simulate more users → add more Gatling machines → linear cost growth

Now:

To simulate more users → use the headroom of already-running large machines → near-zero marginal cost

We moved from a per-test provisioning model to a capacity pool model.

ECS control plane

ECS hosted the orchestration layer that:

Registered/deregistered EC2 load generators
Managed lifecycle and connectivity with Gatling Cloud
Allowed teams to trigger tests the same way as before (no workflow change)

From the teams’ perspective, nothing changed.

From the bill’s perspective, everything changed.

Networking and realism

These EC2 instances lived in Bedrock’s AWS network, close to the systems under test. They could simulate traffic with many IPs and realistic concurrency without hitting the per-machine port ceiling that plagued the old model.

Capacity jump

Before: we were effectively capped around ~500,000 concurrent users because of the need to spawn too many small machines.

After: we could exceed 2,000,000 concurrent users using a handful of very large instances.

Not because Gatling changed. Because the infrastructure did.

Implementation time

The whole platform redesign took about one month.

No migration plan was needed. Teams kept using Gatling the same way. They just benefited from a different backend.

Outcome

The headline numbers

~10× more load capacity
~90% cost reduction

This was not measured with synthetic benchmarks. It was visible directly on the AWS and Gatling invoices.

Instead of paying for dozens of small ephemeral instances per test, we paid for a few large instances running continuously in our account.

Cultural effect

The most important effect was psychological:

Teams stopped worrying about how expensive their tests were.

They could test more, test bigger, test longer.

That is exactly what you want in performance engineering.

FinOps as architecture

This project is the best example of something I strongly believe:

Latency, errors, and euros per user belong on the same dashboard.

The cost problem was not solved by negotiation with a vendor or by asking teams to be careful. It was solved by changing the architecture.

What I’d do differently

Very little.

The solution was intentionally simple and pragmatic because the tool choice (Gatling) and workflow were fixed constraints.

If I had more time, I would probably add richer observability and historical reporting around test runs and resource consumption. But the core architectural decision is something I would reuse as-is.

All case studies

Keep reading

A role to fill, or just a conversation? Let’s talk.

Book a 30-min intro Email me