Amazon Web Services says overwhelmed network devices triggered outage
Illustration by Alex Castro / The Verge
Amazon Web Services (AWS) has provided an explanation as to what caused the outage that downed parts of its own services, as well as the third-party websites and online platforms that utilize AWS. In a post on the AWS website, the company explains that an automated process caused the outage, which began around 10:30AM ET in the Northern Virginia (US-EAST-1) region.
âAn automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network,â Amazonâs report says. âThis resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks.â
According to the report, this issue even impacted Amazonâs ability to see what exactly was going wrong with the system. It prevented the companyâs operations team from using its real-time monitoring system and internal controls that they typically rely on, explaining why the outage took so long to fix. Amazon notes that service started didnât start improving until 4:34PM ET, and the issue was fully resolved at 5:22PM ET.
Since Amazonâs Support Contact Center also runs on the AWS network, customers werenât able to create support cases for seven hours during the outage. Amazonâs Service Health dashboard, which the platform uses to provide status updates, was also impacted, resulting in Amazonâs delayed acknowledgment of the issue. The company says that itâs working on a way to improve its response to outages, and plans on releasing a revamped version of the Service Health Dashboard that should help customers across receive timely updates if an outage occurs.
In addition to knocking out popular services, like Venmo, Tinder, Disney Plus, and even Roomba, the December 7th outage also put some Amazon deliveries on hold. Amazon experienced its last major outage around this time last year, causing a number of sites and apps to go down for hours.