The way it was, December 24th
So what caused that outage of Netflix on Chrismas Eve
, just as you were set to watch "It's a Wonderful Life
" streamed to your mobile device? Well, as it turned out, the problem was caused by an accidental deletion of data
on Amazon Web Service. Amazon posted a summary of the events that led up to the outage which started at 3:24pm EST. Amazon says that the number of its subscribers that were affected were limited to those on the East Coast using a service that counted on the Amazon Elastic Load Balancing Service. Even then, only a small portion of those using ELB service were affected.
The data deletion was done inadvertently by a maintenance process run accidentally by one of the few developers with access to this area and the mistake was not noted at first. When the problem started, Netflix started focusing on the API errors, but it took some deep digging to find the root of the problem.
"It was when the ELB technical team started digging deeply into these degraded load balancers that the team identified the missing ELB state data as the root cause of the service disruption. At this point, the focus shifted to preventing additional service impact and recovering the missing ELB state data."-Netflix
It wasn't until 3:05pm EST the next day, which was Christmas, that Netflix reported that the service was up and running
. To make sure something like this doesn't happen again in the future, changes have been made to prevent accidental modification without approval. Additionally, Netflix says it has learned how to get the service up significantly faster in the unlikely event that the same events happen again.