The Amazon Web Services outage that disrupted a slew of industries last week was caused by an automated process that overwhelmed its networking devices, according to Amazon.
The interruption of service in the middle of the holiday shopping season caused problems internally as Amazon delivery workers reported difficulty doing their jobs and prominent websites including streaming services Disney+ and Netflix were knocked offline.
Apologizing for the impact the shutdown had on customers, Amazon said in a blog post the problem started at 10:30 a.m. ET on Tuesday in its Northern Virginia operations where “an automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network.”
“This resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks,” said the company on its blog published Friday. “These delays increased latency and errors for services communicating between these networks, resulting in even more connection attempts and retries.”
The timing of the outage in the peak holiday shopping season may cause headaches for Amazon down the road, but the company said it has taken steps to prevent a repeat of last week’s problems. Amazon said it disabled the automated processes causing the overwhelming network activity and will not continue them until it fixes the error.
Amazon’s tech troubles extended to its monitoring systems, which the company said slowed its ability to understand the extent of the problem. The Seattle-based company said it was working to release a new version of its service health dashboard to help people better understand what is happening in similar situations.
Amazon’s competitors are already looking to take advantage in the aftermath of the major outage in the eastern U.S. Oracle co-founder Larry Ellison took a veiled shot at Amazon in a quarterly earnings conference call, noting that Oracle’s cloud service “never goes down” — unlike Amazon, Google and Microsoft, according to CNBC.
Last week’s outage is not the only tech challenge Amazon is facing. A cybersecurity vulnerability in the widely used open-source logging platform Apache Log4j made for a busy weekend for Amazon’s employees, according to Amazon Web Services’ David Nalley.
“The vulnerability is severe and due to the widespread adoption of Apache Log4j, its impact is large,” wrote Mr. Nalley on the company’s blog on Sunday. “We highly encourage you to review, patch or mitigate this vulnerability. This tool may help you mitigate the risk when updating is not immediately possible.”
Many companies are working to overcome potential problems associated with the Log4j vulnerability. Cybersecurity and Infrastructure Security Agency director Jen Easterly said Saturday that the Biden administration was working closely with the private sector to address the problems.
Ms. Easterly touted the work of the Joint Cyber Defense Collaborative, a partnership between tech companies such as Amazon and the U.S. intelligence and national security community, to combat the Log4j vulnerability.
“We have established a JCDC senior leadership group to coordinate collective action and ensure shared visibility into both the prevalence of this vulnerability and threat activity,” Ms. Easterly said in a statement Saturday. “By bringing together key government and private sector partners via the JCDC, including our partners at the FBI and [National Security Agency], we will ensure that our country’s strongest capabilities are brought to bear in an integrated manner against this risk.”
Ms. Easterly said CISA would hold a phone call with critical infrastructure entities on Monday afternoon to provide additional answers about the vulnerability from the federal government’s perspective.
• Ryan Lovelace can be reached at rlovelace@washingtontimes.com.
Please read our comment policy before commenting.