For six months our CI pipeline had a failure rate of roughly 15%. Not always the same test. Not always the same error. Sometimes a 429 from an external API. Sometimes a 401 because a test token expired. Sometimes data that a previous test run had modified and not cleaned up.

We'd gotten so used to it that re-running the pipeline was just part of our workflow. Red build? Hit re-run. Wait 8 minutes. Usually green the second time.

That's 15% of everyone's CI time wasted on re-runs. For a team of six running about 40 pipelines a day, it added up to roughly 45 minutes of collective waiting time, every single day. Six months of that is around 135 hours lost.

Diagnosing the Problem

The issue was simple once we looked at it directly: our integration tests were hitting real services. Specifically:

A staging instance of our payment service (Stripe test mode).
A third-party CRM API with aggressive rate limits (100 requests/minute).
Our own staging database, which wasn't reset between test runs.

Each of these introduced non-determinism. A test that hits Stripe depends on Stripe's staging infrastructure being available. A test that hits the CRM depends on the last pipeline not having consumed the rate limit budget. A test that reads from staging DB depends on the previous test's writes being rolled back — which they weren't.

We knew this in theory. We'd talked about "fixing the tests properly" every sprint retro for months. It never happened because it felt like too big a project.

The Fix That Took One Day

It actually took about 6 hours across two engineers.

Phase 1: Mock the external APIs (3 hours)

We had OpenAPI specs for both the payment service and the CRM (the CRM published their spec publicly on their developer portal). We imported both into moqapi.dev and got live mock endpoints for both services immediately.

Then in our test config, we overrode the API base URLs:

// jest.config.js (or vitest.config.ts)
process.env.PAYMENT_API_URL = process.env.CI 
  ? "https://moqapi.dev/api/invoke/mock/[payment-mock-id]" 
  : process.env.PAYMENT_API_URL

process.env.CRM_API_URL = process.env.CI
  ? "https://moqapi.dev/api/invoke/mock/[crm-mock-id]"
  : process.env.CRM_API_URL

In CI, tests hit the mocks. In local development, engineers could still hit real services or the mock — their choice.

Phase 2: Fix the database isolation (2 hours)

This was the other piece. We moved integration tests to use a transaction-wrapped test database (each test runs in a transaction that's rolled back after the test). This is a well-documented pattern in Knex, Prisma, and most ORMs.

Phase 3: Update CI environment variables (30 minutes)

Added the two mock URLs to our GitHub Actions secrets and CI config. Pipelines nothing else changed.

Results

The first week after the change: zero flaky failures. Not one. The pipeline that had a 15% failure rate for six months ran 280 times without a random failure.

Pipeline time also dropped by about 90 seconds per run because the mock responds faster than real external APIs (no network roundtrip to Stripe's servers, no rate limit waits).

The Pattern to Remember

If your test depends on an external service being available, being fast, and having the right data — it will fail randomly forever. The fix is to make those dependencies deterministic. Mock APIs are the right tool for HTTP dependencies. Transactional rollback is the right tool for database dependencies.

Neither is a shortcut. You're not testing less — you're testing your code's behaviour in isolation from infrastructure problems. That's what a unit or integration test is supposed to do.

For more on the testing strategy hierarchy, the API testing strategies guide on this blog has the full breakdown of unit → integration → contract → chaos layers.

Our CI Tests Were Randomly Failing for 6 Months. Mock APIs Fixed It in a Day.

Diagnosing the Problem

The Fix That Took One Day

Results

The Pattern to Remember

About the Author

Related Articles

What Is Mock Data and Why It Matters for Modern Development

API Testing Strategies for Modern Engineering Teams

API Mocking vs Stubbing vs Faking: The Developer's Definitive Guide

Ready to build?