5 min Security

Global IT outage due to botched CrowdStrike update: what went wrong?

IT incident with massive impact should never have happened

Insight: Security Platforms

Global IT outage due to botched CrowdStrike update: what went wrong?

CrowdStrike normally provides highly sophisticated, reliable cybersecurity solutions. All kinds of endpoints, from servers to laptops and desktops, are protected by a state-of-the-art system that operates deep within the OS. Today, the downside of this approach has become crystal clear: a global IT outage has resulted in countless endpoints becoming unresponsive. How could things go so badly wrong?

The global damage is enormous, with potentially hundreds of millions of dollars lost. Thousands of organizations have to shut down their operations largely because their Windows machines are inaccessible. CrowdStrike could face numerous claims in the coming weeks and months. Nobody can check in at Las Vegas hotel chain The Cosmopolitan, flights to and from prominent airports have to be curtailed and supermarkets are forced to accept only cash. Companies are being floored on the stock market due to the disruptions.

The cause is CrowdStrike, which caused a blue screen of death through a faulty update. That company’s Falcon security platform is used to protect Windows systems. “We have widespread reports of BSODs on Windows hosts, occurring on multiple sensor versions,” CrowdStrike said. “Our engineers are actively working to resolve this issue and there is no need to open a support ticket.”

Read more: Problems at CrowdStrike, Microsoft lead to global IT outage

The problems are not due to a cyber attack. A small piece of software that requires little processing power normally acts as a Windows sensor to detect threats. This one is called csagent.sys. When it detects a threat, it automatically blocks it. Part of the process of providing this sensor with new techniques and thus detecting new threats is that it must be updated regularly. That’s what CrowdStrike does automatically, something that caused the glitch last night.

Fix is trickier than it looks

The culprit, then, is csagent.sys. Although users are sending tickets to CrowdStrike en masse, that is not a workable solution in the short term. Meanwhile, CrowdStrike has put out a “Tech Alert,” as a screenshot on X shows. The same e-mail is also being displayed by other sources. In it, CrowdStrike offers a four-step plan to solve the problem.

Tip: CrowdStrike and AWS further collaborate on cloud security and AI

Users should first boot their Windows system in safe mode or through the Windows Recovery Environment. Then, one should navigate to C:Windows System32 CrowdStrike and delete the file named “C-00000291.sys.” Finally, another reboot for the host is required. This is not easily automated, especially for laptops and desktops that are affected. In addition, it does not work for all endpoints, states CrowdStrike OverWatch Director Brody Nisbet on X.

As is often the case with large-scale IT problems, it’s up to internal teams to solve other people’s problems. More acrimonious is the fact that the communication from CrowdStrike appears to be only via emails and X. Accessing critical information requires a log-in to CrowdStrike.com. CrowdStrike CEO George Kurtz also recommends this course of action. This is, quite frankly, unacceptable. It is simply an unnecessary blockade to information retrieval, right at a time when IT professionals are expected to quickly implement a fix for their colleagues. In fact, for hospitals and other critical infrastructure, that can be a life-and-death situation. A clickable link should have also be visible in giant red letters on CrowdStrike’s landing page earlier. The current statement is also minimally informative at best.

Also read: CrowdStrike expands Falcon Platform with Linux protection capability

How could this happen in the first place?

Beyond the acute need to fix the problem, the incident raises numerous questions. How was it possible for such an impactful error to be printed live? Was there a disgruntled employee with an abundance of privileges? Was the company itself hacked?

We don’t expect all the details from CrowdStrike right away, but we do expect an indication of what went wrong as soon as possible. This is a tenuous situation to be in; after all, it should never have gone wrong in this manner. Microsoft has proven before that self-glorifying blogs about internal incidents can often be inaccurate later. No one benefits from jumping to conclusions, but a security party like CrowdStrike should be able to recount its travails with razor-sharp clarity on what kind of lapse caused this mess.

The bottom line is that security vendors should never favour speed over due diligence. Updates for critical systems that protect the operating system require much more than that. Recreational apps can afford to push an update that causes some unforeseen problems. A security sensor within the OS is no different for critical infrastructure than a control mechanism for a dam or an auxiliary power supply for a hospital. You can’t do without them, you don’t want them to change unseen, and above all, they should never prevent you from accessing the systems they protect. Those principles are things you’d expect CrowdStrike to inform others about, not something they have to be floored by to understand.

Reading Tip: Crowdstrike unveils new products and new CrowdXDR members