← Incident Index
CivilizationalCDN / Internet Infrastructure·June 8, 2021Draft

Fastly: Single customer config change triggers global Fastly CDN outage

A valid customer configuration change exposed a latent software bug in Fastly's edge servers, causing 85% of the network to return errors. Took down major sites including Amazon, Reddit, Twitch, NYT, UK gov.uk, and Stack Overflow simultaneously.

Velocity Governance perspective· Sentinel-10 overlap: 2 of 4

2 of the 4 practices that failed in this incident are part of the Sentinel-10 — the engineering protocols Concordance flags as most degraded under AI-accelerated development.

This incident pre-dates today's AI-velocity surge. The thesis is that the same practices that failed here will fail faster under AI velocity if not actively governed. Read the Velocity Governance thesis →

Impact

1h
Downtime

Approximately 85% of Fastly's global network returned errors. Affected services serving billions of requests. Highlighted internet's dependence on a small number of CDN providers.

Root cause (from published RCA)

A valid customer configuration triggered a bug in software that had been deployed on May 12 — the bug had been latent for nearly a month before customer config exposed it. When the configuration was applied, ~85% of the network began returning errors. The bug was in code that processed VCL configuration changes; affected POPs were all in the same software version.

Concordance protocols that map to this root cause

Click any protocol to see every other indexed incident where it failed.

Protocol 4.3Test Coverage· Testingsee all incidents →
Test coverage: the latent bug was not exercised by Fastly's test suite despite affecting common customer configuration patterns.
Protocol 5.8Feature Flagging· ReleaseSENTINELsee all incidents →
Feature flagging: software was deployed to 100% of POPs with no feature flag that could have isolated the bug to a subset.
Protocol 5.7Rollback Capability· ReleaseSENTINELsee all incidents →
Rollback capability: rollback took ~49 minutes — far longer than typical CDN incident-response targets.
Protocol 4.2CI Gating· Testingsee all incidents →
CI gating: integration tests did not include the customer-configuration class that triggered the bug.

Primary sources

Summary of June 8 outage
Fastly · June 8, 2021

Related incidents

Other incidents that failed at least one of the same protocols.

CrowdStrikeJul 2024
Falcon sensor channel-file update crashes 8.5M Windows hosts
Knight Capital GroupAug 2012
Knight Capital loses $440M in 45 minutes from incomplete software depl…
CodecovApr 2021
Codecov bash-uploader supply-chain compromise exfiltrates customer sec…
OpenSSL ProjectApr 2014
CVE-2014-0160 — Heartbleed buffer over-read in OpenSSL TLS heartbeat
#cdn#latent-bug#config-trigger#global-outage#rollback-time
Check your repo against these protocols.
Concordance scores any public GitHub repo against all 50 engineering protocols in 60 seconds. No signup, no install.
Run a free scan →
Concordance Incident Index entry · CC BY 4.0 · Methodology · Errata: hello@concordancelabs.com