Amazon Web Services experienced two outages caused by its own AI tools. The incidents involving the Kiro AI development tool and Amazon Q Developer have raised doubts within the organization about the rollout of autonomous AI assistants.
In December, AWS experienced a 13-hour outage after engineers allowed their Kiro AI tool to make certain changes. According to four sources from the Financial Times, the autonomous tool decided that the best solution was to “delete and recreate the environment.” The outage affected a system that allows customers to explore the costs of AWS services.
It was the second time in a short period that an Amazon AI tool was at the center of a service outage. “We’ve already seen at least two production outages [in the past few months],” a senior AWS employee told the FT anonymously. “The engineers let the AI [agent] resolve an issue without intervention. The outages were small but entirely foreseeable.”
Amazon emphasizes that it was a coincidence that AI tools were involved”, and that “the same issue could occur with any developer tool or manual action.” The company claims that both cases were user errors and not AI errors. There is no evidence that errors occur more frequently with AI tools.
According to Amazon, the December malfunction was an “extremely limited event” that only affected a single service in parts of China. The second incident had no impact on a “customer facing AWS service.”
Access control and safeguards
The employees state that Amazon’s AI tools are treated as an extension of an operator and are given the same permissions. In these two cases, the engineers involved did not require approval from a second person before making changes, which would normally be the case.
Amazon states that the Kiro tool “requests authorization before taking action” by default. However, in the December incident, the engineer had “broader permissions than expected—a user access control issue, not an AI autonomy issue.”
Kiro would go beyond vibe coding and writing code based on specifications. Previously, Amazon relied on Amazon Q Developer, an AI chatbot that helps engineers write code. According to three employees, this was involved in the earlier outage.
After the December incident, AWS has, according to the company, “implemented numerous safeguards,” including mandatory peer review and staff training. Nevertheless, some Amazon employees remain skeptical about the usefulness of AI tools for most of their work, due to the risk of errors. The company has set a goal that 80 percent of developers use AI at least once a week for coding tasks and is closely monitoring adoption.
Update – AWS has informed us that it disagrees with the claims made by FT’s sources. The company: “This brief event was the result of user error—specifically misconfigured access controls—not AI. The service interruption was an extremely limited event last year when a single service (AWS Cost Explorer—which helps customers visualize, understand, and manage AWS costs and usage over time) in one of our two Regions in Mainland China was affected. This event didn’t impact compute, storage, database, AI technologies, or any other of the hundreds of services that we run. Following these events, we implemented numerous additional safeguards, including mandatory peer review for production access. Kiro puts developers in control—users need to configure which actions Kiro can take, and by default, Kiro requests authorization before taking any action.”
Tip: AWS makes Kiro development environment significantly more expensive (update: AWS response)