2 min

Tags in this article

, , , ,

The Azure cloud from Microsoft has a very high availability. Last year, the cloud service had an uptime of 99.995 percent. But that’s not enough: CTO Mark Russinovich wants to further increase reliability with new initiatives.

Despite this high uptime, Azure had three unique and significant incidents in the past year, according to Russinovich in a blog post.

These include a data center failure in the South Central region of the United States in September 2018, problems with Azure Active Directory Multi-Factor Authentication in November last year, and DNS maintenance issues in May this year.

According to Russinovich, the three incidents were the result of several problems, which only led to customer failures through complex interactions. The CTO states that the company has learned from its mistakes, and now wants to increase the reliability of Azure.

Quality Engineering Team

First of all, a new Quality Engineering team has been set up, which reports directly to the CTO. The team needs to set up new approaches to create a more reliable platform. And there are already a number of initiatives.

First of all, the availability zones will be further expanded. At the moment there are zones live in the ten largest Azure regions, but between now and 2021 there are also new zones coming to the next ten largest regions.

Work is also continuing on Project Tardigrade, a service that detects hardware problems or memory leaks that could lead to an operating system crash just before they occur. This allows Azure to freeze the virtual machines in a few minutes, so that the workloads can be moved to a healthy host.

Safe deployment practice framework

Microsoft is further expanding its safe deployment practice framework, ensuring that all code and configuration changes in Azure must first pass a set of tests before they are rolled out to the various regions. The framework is extended to include all software-defined infrastructure changes in Azure.

A preview of the possibility to initiate your own failover on the storage account level is also made available to customers. Microsoft wants to invest further in improving zea-impact and low-impact update techniques, such as hot patching and live migration.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.