Why did so many security vendors abandon MITRE's stresstest?

This year, MITRE made headlines primarily because its leading vulnerability database was in danger of being discontinued. For years, another issue has been plaguing the American non-profit. The voluntary ATT&CK evaluations in which security players participate are no longer popular. Below, we explain why and what MITRE plans to do to turn the tide.

This year’s list of participants is particularly disappointing, not in terms of caliber, but in the length of its participant list. Acronis, AhnLab, CrowdStrike, Cyberani, Cybereason, Cynet, ESET, Sophos, Trend Micro, WatchGuard, and WithSecure are still participating. In 2022, there were still 30 participants, the following year 29, in 2024 only 19, and now only 11.

This is a striking development. The MITRE ATT&CK Evaluations are well known in the security world ever since their 2019 inception. Every year, the tests validate the capabilities of leading endpoint security products. Big names such as CrowdStrike, Fortinet, SentinelOne, Microsoft, Palo Alto Networks, Sophos, and Trend Micro have participated multiple times. This is always a conscious choice, as participation depends entirely on the vendor. Although MITRE acts as a neutral party in the testing process, the evaluations are effectively a kind of open-book test and the tooling can be adjusted on-the-fly. MITRE emulates the behavior of a specific, pre-announced cyber threat and checks the effectiveness of the EDR solution in a test environment. Various techniques are reviewed to determine the detection capabilities of the security tooling.

Every vendor is eager to achieve a high score, but it is not easy to achieve 100 percent results in every subcategory. Instead, participation itself is a kind of vote of confidence in the company’s own product, knowing what MITRE intends to test in the EDR software. These tests vary from year to year, as the threat landscape evolves and because security companies want to stay on top of their game with increasingly difficult challenges to face. Because so many industry players participate habitually, the tests offer each vendor the opportunity to outperform the competition. These motivations are certainly still legitimate in 2025, so why the dropout rate over the past two years?

From gold standard…

We just mentioned a “score,” but at MITRE Corporation, you would be called out for using that terminology. Technically speaking, the evaluation measures the “analytic coverage” of a security product, among other things. However, because this is a success rate that one may measure in percentages, every vendor practically chooses to refer to it as a score. If not, the priority is often on relative strength compared to the competition. For example, SentinelOne was the only security player in 2022 to report 108 out of 109 detections, resulting in a 99 percent analytical coverage. A year later, Palo Alto Networks was the only one to claim a “perfect score,” but CrowdStrike and Microsoft also reported 100 percent coverage utilizing a different interpretation of the results where detection alone was seen as sufficient. The difference lay in the “Analytic” score, where some attack steps were logged as telemetry and did not come in via alerts.

This shows how interpretable MITRE ATT&CK evaluations are, or at least the fact that they were subjective to an extent back in 2022. In that year, the average score (let’s just keep calling it a score) for detections was 75 percent. In 2023, the equivalent was 73 percent. The years are not directly comparable because, as mentioned, MITRE emulates a different set of attackers each year. Given that there were approximately thirty participants in both years, this does suggest a healthy spread with average scores that are consistent with well-prepared but not infallible candidates. In other words, a high score was remarkable and coverage of less than three-quarters of the detections was a sign for a need to improve. There was little cause for concern for MITRE acting as the referee in this yearly test; the evaluations were popular as well as competitive.

…to a formality

In 2024, MITRE made several mistakes in setting up the evaluations that threw a spanner in the works. The idea behind the innovations in MITRE ATT&CK Evaluation Round 6 (2024) was fine in and of itself. The hope on MITRE’s part was to get closer to the reality of cyber defense. For example, the tests introduced much more noise on top of signals that characterized the emulated threats and the nonprofit added more environments than just the endpoint (cloud, identity, containers) to the testbed. Unfortunately, it resulted in exactly the opposite of what MITRE hoped to achieve. Endpoint security players came up with settings that specifically matched the criteria on which MITRE would assess them, not their workable, real-world versions. Take the problem of noise: instead of fine-grained checks on alerts of actual threats, some vendors decided to transform all kinds of noise into an alert. For actual SOC teams, such settings would send them into an alert fatigue hell and scrambling to act upon real threats.

Further complications and assessments without proper context of each individual step made the difference between a MITRE score and a customer organization’s experience of the security player potentially enormous. The result: according to Forrester, the high scores of 2024 were unreliable. One example revolved around scores being dependent on success at an earlier stage, which effectively caused a double failure owing to one missed alert.

In 2025, MITRE intended to turn the criticism into a better test. It was to no avail. Microsoft, Palo Alto Networks, and SentinelOne backed out of their participation. InfoSecurity Magazine highlights the key issue: the withdrawals were reportedly due to a feeling that participation was primarily a PR stunt and no longer a way to strengthen one’s own product. CrowdStrike and Sophos are proud of the scores they achieved regardless. And rightly so, it seems, given that the emulated threats concern the Chinese state hacker group Mustang Panda and the cybercriminals of Scattered Spider. These are advanced adversaries on the digital front, and the 100 percent coverage in detections by both parties should inspire confidence. It would on any normal day, but these are far from normal days as far as the MITRE ATT&CK Evaluations are concerned.

The problem is that last year’s and this year’s scores are simply too high to be taken too seriously as a difficult test, even though we believe that achieving those scores is a tough task. In 2024, there was a divide between high-scoring players and two-thirds of participants with a detection rate of less than 50 percent. In 2025, the weaker participants have left and the average is almost 100 percent. Anyone who has ever taken an exam knows that a class will always score variably on a good test. The fact that this is not the case now fuels the idea that the MITRE evaluation is outdated.

What next?

MITRE’s evaluations take place throughout the year. This means the nonprofit will have been well aware of the low participation rate through 2025. MITRE CTO and SVP of MITRE Labs Charles Clancy told InfoSecurity that the participants who dropped out often had too few staff or too little time available for the tests. Clancy also admitted that MITRE tries to make it more difficult to achieve the same result each year, but that the nonprofit has gone too far in this regard. The balance was lost. There was also an annual forum with vendors to prepare for the evaluations. That initiative has also become less active.

Next year, trust needs to be restored. This will go hand in hand with a renewed focus on the forum with vendors. We assume that MITRE will be more conservative next year when it comes to the complexity of the test, but they will nevertheless try to avoid a perfect ‘score’. The fact is that, in reality, there are no perfect products, so a leading, representative evaluation of endpoint security should reflect this. With all the other problems MITRE is facing due to issues surrounding its own budget, it remains to be seen whether this recovery can be sustained in the long term.

Why did so many security vendors abandon MITRE’s stresstest?

From gold standard…

…to a formality

What next?

Stay tuned, subscribe!

Salesforce makes Contact Center much more effective with Agentforce

Oracle: sovereignty is a matter of trust, not just technology

Cerebras partnership breathes new life into AWS Trainium

JFrog: How to leap along the AI workflow tightrope

IFS builds an industrial AI ecosystem through partnerships

Why this CIO ditched Microsoft for Google and Slack

EU digital sovereignty and policy: Cisco's perspective

Inside Cisco's AI-powered customer experience strategy

The Zero-Drift Frontier: Modern Edge Demands on Kubernetes

When is an SBOM not an SBOM? CISA’s Minimum Elements

Sovereign: the new normal for AI and cloud native (and how to make it work)

De IT Afdeling van de toekomst

GITEX ASIA 2026

GITEX ASIA 2026

Southeast Asia AI Application Summit 2026

SAS Innovate 2026

Team '26

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices