4 min Security

Microsoft gives advice after CrowdStrike chaos to prevent repeat

Tips for organizations, but no changes to Microsoft policy (yet)

Microsoft gives advice after CrowdStrike chaos to prevent repeat

The troublesome CrowdStrike update that led to 8.5 million Windows systems outages occurred a week ago. Now that the dust has settled somewhat, Microsoft shares how organizations can prevent a repeat of the impactful outage.

Microsoft says it must prioritize “end-to-end resilience.” Put even more succinctly: to keep Windows alive. It should not be possible for hospitals, airports and other critical infrastructure to be disabled simply via a software update. A CrowdStrike driver with kernel-level access was unprepared for nonsensical data, creating Blue Screens of Death worldwide. Yet it is also up to Microsoft as well as organizations to do their part to ensure that such failures do not happen again.

Best practices

Microsoft’s message is brief, but contains six concrete pieces of advice. A contingency plan is a front and centre tip, followed by excellent but predictable advice to back up often and safely. In addition, organizations can get Windows up and running faster by deploying restore points and recovery options from the OS. This includes snapshotting virtual machines.

Deployment rings also can be of use to prevent a full-scale outage. This suggests that organizations not update all their systems at once, but gradually, such as through Windows Autopatch. However justified, this advice would have had no effect at all with the CrowdStrike failure in question. After all, in this case it was an update that took place without IT professionals installing it or even having a say on the matter. From now on, CrowdStrike finally does offer this feature, but it needed a major incident to come to this conclusion.

Tip: After global CrowdStrike outage, systems restart slowly

Rounding out the sextet of tips are focuses on Windows security and fleet management. By-default security options should remain on, such as firewalls, encryption, biometric authentication and endpoint detection and response (EDR). A cloud-native approach to managing Windows systems avoids the need for much manual work as well.

Contradictory

The brief Microsoft announcement appears at odds with itself. With the headline “Windows resiliency” and a clear paragraphs-long run-up to internal practices to make the OS stronger, you might expect more attention to Microsoft’s own practices. How can it ensure that a CrowdStrike driver doesn’t crash a Windows system in case of faulty behaviour? After all, it is WHQL-certified and therefore comes with Microsoft approval and responsibility.

According to a Microsoft spokesperson, the company had to allow this kind of kernel-level access, and did so at the urging of the European Commission (EC) in 2009. The spokesperson told The Wall Street Journal that a security vendor had complained about this because Microsoft had been able to provide defensive capabilities at the deepest level with its own solutions.

That argument doesn’t suffice. Although this compromise with the EC gave Microsoft a lot of extra certification work, it was in carrying out this task that it has fallen short. Regardless of the fact that CrowdStrike offered all sorts of solutions to improve its own rollout process, it should not be possible for a deficient driver to get approval in the first place. The question “What if a third-party kernel driver receives nonsensical data?” cannot be answered with “then Windows crashes.” At least, not if Windows systems are deployed for critical applications. Microsoft will eventually have to deal with that matter, too.

Microsoft, meanwhile, has a comprehensive explanation of kernel-level drivers and protection methods within Windows. These are significant but have remained unchanged since the CrowdStrike incident. The multitude of safeguards raises the question of how things could ever have gone as wrong as they did a week and a half ago.

Also read: Global IT outage due to botched CrowdStrike update: what went wrong?