Deleted data caused last week's Netflix outage


The way it was, December 24th
The data deletion was done inadvertently by a maintenance process run accidentally by one of the few developers with access to this area and the mistake was not noted at first. When the problem started, Netflix started focusing on the API errors, but it took some deep digging to find the root of the problem.
"It was when the ELB technical team started digging deeply into these degraded load balancers that the team identified the missing ELB state data as the root cause of the service disruption. At this point, the focus shifted to preventing additional service impact and recovering the missing ELB state data."-Netflix
It wasn't until 3:05pm EST the next day, which was Christmas, that Netflix reported that the service was up and running. To make sure something like this doesn't happen again in the future, changes have been made to prevent accidental modification without approval. Additionally, Netflix says it has learned how to get the service up significantly faster in the unlikely event that the same events happen again.
source: AmazonWebServices via Electronista
Things that are NOT allowed:
To help keep our community safe and free from spam, we apply temporary limits to newly created accounts: