2 min Devops

Amazon calls engineers together after AI-related outages

Amazon calls engineers together after AI-related outages

Amazon is calling engineers together after a series of outages involving AI coding tools. From now on, junior and mid-level engineers must ask senior engineers for permission before making AI-assisted code changes.

This was reported by the Financial Times. Dave Treadwell, senior vice president at Amazon, sent employees an email with a clear message: “Folks, as you likely know, the availability of the site and related infrastructure has not been good recently.” The weekly “This Week in Stores Tech” (TWiST) meeting, normally optional, was made mandatory for all attendees this time.

Earlier this month, Amazon’s website and app were down for nearly 6 hours after an error in software code. Customers were unable to complete transactions and had no access to account information or product prices.

The briefing mentions a “trend of incidents” characterized by a “high blast radius” and “Gen-AI assisted changes.” Amazon cites “novel GenAI usage for which best practices and safeguards are not yet fully established” as a contributing factor. In December, AWS also experienced a 13-hour outage after engineers allowed their Kiro AI tool to make certain changes. Techzine previously reported extensively on this incident and a second similar case, which also involved Amazon Q Developer.

Stricter control of AI changes

Junior and mid-level engineers must now seek approval from seniors before implementing AI-assisted changes. Treadwell also announced short-term initiatives to limit future outages. Amazon described the availability investigation as “part of normal business” and says it is continuously striving for improvement.

Staff shortages also play a role in the background. Amazon cut 16,000 corporate jobs in January, the latest in a series of layoffs. Several engineers previously reported to the FT that as a result, they had to deal with more “Sev2s” on a daily basis, incidents that require a quick response to prevent outages. Amazon disputes that the staff reduction is responsible for the increase in disruptions.