2 min Security

Snowflake shows resilience during AWS outage

Snowflake shows resilience during AWS outage

Last week’s outage at Amazon Web Services (AWS) showed how vulnerable cloud infrastructures are when criminals target critical services.

While organizations experienced disruption, Snowflake reported that more than 300 customer workloads continued to run without interruption. This was thanks to its Snowgrid technology.

According to Snowflake, these workloads automatically switched to alternative regions or clouds, allowing business processes to continue largely uninterrupted. SiliconANGLE writes that Snowflake took the opportunity to remind customers that an outage does not necessarily lead to a business crisis, provided a well-thought-out continuity strategy has been put in place.

Snowgrid has been available since 2022 and offers organizations the ability to replicate data and workloads across multiple cloud regions on AWS, Microsoft Azure, and Google Cloud. When a failure occurs, a pre-set failover scenario can be activated manually or automatically. The platform ensures that workloads resume at a secondary location without data loss or duplication. Applications and dashboards are reconnected through automatic DNS updates. This usually happens without noticeable downtime for end users.

According to Chief Product Officer Christian Kleinerman, automatic connection redirection is an essential part of Snowgrid. In practice, he says, users notice at most a brief interruption, after which systems continue to function normally.

The AWS outage was caused by a problem in the DNS infrastructure of the U.S. East 1 region, according to SiliconAngle. As a result, control plane services were also unavailable, even for customers with workloads spread across multiple availability zones. SiliconANGLE notes that the incident makes it clear that availability zones do not offer complete protection. Services such as DNS or identity management are shared across a region, so a failure at that level can affect all zones simultaneously.

Redundancy within a region is not enough

Snowflake endorses that conclusion. According to Kleinerman, many organizations mistakenly believe that redundancy within a single region is sufficient for business continuity, when in practice an architecture spanning multiple regions or clouds is needed to ensure resilience.

Snowflake emphasizes that Snowgrid is not a solution that can be deployed without maintenance. Customers must carefully design and regularly test their replication and failover processes to ensure they work when needed. According to Snowflake, organizations that did so experienced only minimal disruption during the recent AWS problems.

Kleinerman wrote in a blog post that outages are inevitable, but that preparation makes the difference between a normal workday and a hectic Monday.