4 min Security

CrowdStrike licks its wounds after catastrophic update

Insight: Security Platforms

CrowdStrike licks its wounds after catastrophic update

The aftermath of Friday’s global disruption caused by a CrowdStrike Falcon sensor is becoming clearer. On Saturday, Microsoft revealed the extent of the damage: 8.5 million Windows systems were affected. CrowdStrike has now provided additional information, though a comprehensive postmortem is still pending.

On Friday, a routine update to a CrowdStrike sensor led to worldwide IT chaos. Its cause was a “Channel File,” a configuration file that normally keeps the CrowdStrike Falcon sensor’s protection mechanisms up-to-date. Such tweaks are constantly needed to combat cybercriminals’ rapidly evolving behaviour. “This is not a new process,” CrowdStrike explains, and has been in use since Falcon debuted as an antivirus solution in 2013.

Also read: Global IT outage due to botched CrowdStrike patch: what went wrong?

Not a cyber attack, but with the same effect

With each communiqué, CrowdStrike reiterates that the incident was NOT due to a cyberattack. This repeated clarification is understandable, as many organizations likely suspected a hack initially. The disruption had a massive impact on business operations, affecting companies ranging from Ryanair to the London Stock Exchange. Given that CrowdStrike’s client base includes over half of the Fortune 500 companies, as well as numerous hospitals, energy suppliers, and banks, such widespread effects are not surprising.

While the number of affected Windows machines was relatively limited, the impact was still significant. The update caused failures in servers, laptops, and desktops, ultimately affecting 8.5 million devices. This represents less than one percent of all Windows systems worldwide. However, as Microsoft points out: “although the percentage was small, the broad economic and social impact reflects the use of CrowdStrike by companies running many critical services.”

Previously on Linux

Rarely has an IT outage had a more significant impact. It raises the question of how vulnerable the global IT infrastructure, given a single faulty update can shut down air traffic as well as force the cancellation of hospital appointments and disable parts of the banking sector.

Incidentally, the specific CrowdStrike problem occurred before with Linux. In June, for instance, Red Hat warned about kernel panics (effectively the Linux version of a Blue Screen of Death) in newer versions of Red Hat Enterprise Linux owing to the Falcon sensor. There too, CrowdStrike had to implement mitigation steps on the spot. In addition, The Register discovered situations where CrowdStrike appears to have caused similar problems at Debian and Rocky Linux in April.

CrowdStrike’s rollout of its Falcon sensor updates appears to lack the required QA or a phased rollout to prevent problems en masse. Last Friday, at least, the impact was a lot more obvious than in previous security incidents, especially to the general public.

Clear communication

Microsoft has stepped in to help, offering a recovery tool for the affected systems. It’s important to note that just before the problematic CrowdStrike update, there was a significant outage that took most of Microsoft 365 offline. However, as far as we know, these incidents are entirely unrelated. While Microsoft isn’t at fault for the CrowdStrike issue, they’re taking an active role in resolving it, partly because the incident specifically affected Windows systems. Interestingly, the Linux and Mac versions of CrowdStrike’s software didn’t experience any issues.

CrowdStrike’s early-hours response to the crisis left much to be desired. While the company did communicate directly with customers, it took hours before any official announcement or solution was made public. For a brief period, the only guidance available to CrowdStrike customers came from screenshots of emails shared on X (formerly Twitter) or LinkedIn. Company executives were quite active on these same social media platforms, but one has to question whether official channels should have been prioritized over tweets and shares. The decision to put critical information behind a login screen was particularly problematic.

CrowdStrike has since improved its communication. Remediation steps are now clearly outlined in a blog post, and the company is sharing some technical details on other platforms. Notably, the “Root Cause Analysis” section is currently a placeholder. CrowdStrike states, “We understand how this issue occurred and we are doing a thorough root cause analysis to determine how this logic flaw occurred. This effort will be ongoing. We are committed to identifying any foundational or workflow improvements that we can make to strengthen our process. We will update our findings in the root cause analysis as the investigation progresses.” The company promises to share more details in time. We’re looking forward to these insights, but hope CrowdStrike won’t rush to publish. After all, no one benefits from technical blogs about internal issues that later prove to be inaccurate.

UPDATE – Crowdstrike has posted a Youtube video online explaining how admins can manually fix the BSOD issues:

Also read: After global CrowdStrike outage, systems restart slowly