How do you interpret the results of MITRE ATT&CK evaluations?

Not all detections are created equal. Keep that in mind when choosing cybersecurity tools.

The cybersecurity market is very fragmented. Organizations looking to improve their security posture have a tremendously wide range of solutions to choose from. This ensures that there are many offerings for both best-of-breed and best-of-suite approaches. However, as a decision maker within an organization, it is not possible to have and maintain a good overview of this yourself. You need some help to do this.

One of the most common tools is undoubtedly the assessment of suppliers by the well-known analyst firms. Who is considered a leader and positioned in a magic quadrant, a wave or something else? This can perhaps help in narrowing the search down to a shortlist, even though you run the risk of missing out on a solution that’s a very good fit for your organization too. In any case, what the assessments of analyst firms don’t do is provide real insight into how the solutions stack up to each other in practice. That is, how do the solutions perform in a real attack? That’s where the MITRE ATT&CK evaluations come in. We spoke about the most recent evaluation with Andre Noordam, Senior Director of Sales Engineering EMEA at SentinelOne. What should decision makers at organizations look for in the results of this evaluation?

Combination of attacks

The purpose of MITRE’s evaluations is to test how well cybersecurity solutions protect against real-world attacks. This year, malware from two hacker groups, Sandworm and Wizard Spider, were chosen. The former is aimed at destroying and exfiltrating data, the latter is what you might call a traditional ransomware. This involves taking data hostage and then decrypting it in return for payment.

It goes without saying that the EDR solutions participating in this evaluation are intended to detect as much as possible. In addition, the way in which they detect malware also plays a role. For example, the evaluation also keeps track of how many changes in the configuration are required to make the detections. Vendors have to configure their EDR solutions and not intervene during the 48 hours that the test takes to get through all 109 steps into which it is divided. If a config change is is necessary to achieve a higher score, MITRE makes a note of it. According to Noordam, this is a very important part of the evaluation that generally doesn’t get the attention it deserves. It deserves a significant amount of attention, because fewer config changes are always better.

A second component of the tests that deserves more attention according to Noordam, is the number of so-called delayed detections. A delayed detection happens when it takes more than a few minutes to detect the malware. This is usually due to a human or cloud environment having to make a judgment about what the EDR software detects. In the field of cybersecurity, the old emergency room wisdom applies that every second counts. So a delay of more than a few minutes is certainly not desirable.

Apart from EDR, this year’s MITRE ATT&CK evaluation also focuses on prevention, just like it did last year. After all, that’s always better than detection. The earlier you catch a threat, the better it is.

Transparency

MITRE works in a pretty transparent way. It clearly states what has been tested and how the participating vendors are doing in those tests. What it doesn’t do is attach a judgment to that. In other words, the organization itself does not write reports with conclusions in which it ranks one vendor above another. On the one hand, that’s commendable, because in this way they can’t paint a biased picture. On the other hand, it invites the cybersecurity vendors to come up with their own conclusions. And these are always biased.

In any event, it’s hard if not impossible to say who wins in an evaluation such as the one MITRE carries out. That’s not really the point of these evaluations anyway, Noordam states. He sees the MITRE ATT&CK evaluations as something the entire cybersecurity industry benefits from. The differences in the evaluations are becoming smaller, he concludes, which is a good sign, he says. Of course, this does not make it any easier to choose the best. It is almost impossible to point out a real winner. SentinelOne, however, is firmly at the top of the results this year, as it was last year as well. It scores 100 percent in prevention and detection, with no delayed detections. That does have some characteristics of a ‘winner’, even though that’s not necessarily what the evaluations are about.

It’s all about analytic detections

However, only looking at the total scores of the participants in the test is not enough. In fact, they don’t tell even half of the story. During our conversation, Noordam repeatedly emphasizes the type of detection an EDR does as a deciding factor. According to him this should play an important role in the interpretation of the test results. You really have to zoom in on them to determine how well an EDR actually performs.

There are two general types of detection, Noordam continues. There’s detections that fall into the telemetry category and so-called analytic detections. The latter category is in turn divided into three subcategories: general, tactic, technique. In the case of telemetry, an EDR solution does detect something, for example via a hash, but it is unclear what exactly it is that it is detecting. This type of detection is certainly not worthless, but requires a great deal of manpower to develop into proper insights and actions.

With analytic detections you have more insight into the threat from the start. For a general analytic detection, that’s still not very much in terms of extra insight, but if you detect how the attack works technically, you can take action almost immediately. So ideally you want to have as many of the latter types of detection as possible, but at least as many analytic detections as possible. That also ensures the lowest possible pressure on the security teams of organizations. They can then focus on only the most critical threats.

From 109 to 9 alerts

A final element in a test such as the MITRE ATT&CK evaluation is how the detections are presented to the end user of the EDR platform. Detection is only the first step in the process. The next step is that the people responsible for the security of the (digital) environment of an organization do something with those results. If they get flooded with alerts, there is a good chance that they will miss important ones. In other words, it is important for an EDR solution not to simply send an alert for all of the 109 separate detections one by one. It needs to do a lot of the analysis for the end user, and present him with only the most important alerts.

At this point, one of SentinelOne’s most distinguishing feature comes into play. That is the Storyline functionality. This allows the solution to make correlations and merge alerts. In this year’s MITRE ATT&CK evaluation, SentinelOne boils 109 alerts down to just 9 thanks to Storyline. Noordam summarizes it as getting a maximum overview of the threat through a minimum number of alerts. This step from data to information is ultimately the most important in the entire process. If you have that right as an organization, then you can respond quickly to (potential) threats.

In principle, of course, organizations can never be 100 percent safe, but they can do their best to get as close as possible to that goal. The results of the MITRE ATT&CK evaluation can be a great help to get there. However, you have to be able to interpret them. We hope that after reading this article you have gained some insights into how to do this.

How do you interpret the results of MITRE ATT&CK evaluations?

Insight: Security Platforms

Combination of attacks

Transparency

It’s all about analytic detections

From 109 to 9 alerts

Stay tuned, subscribe!

Building on 50 years analytics, SAS charts the future of AI

Broadcom launches Tomahawk Ultra with 250ns network latency

Chris Wright: AI needs model, accelerator, and cloud flexibility

Replatforming virtualized workloads: Do your VMs need a new home?

HPE’s strategy: AI, smart switches, GreenLake and beyond

HPE takes a full-stack approach to the AI Factory

"AI puts process modeling on steroids", SAP's Dee Houchen on business process management

What is HPE's Unleash AI program and how does it help companies?

How AI and automation are redefining ROI in the enterprise

Enhancing video encoding: The AV1 support in the new ARTPEC-9 System-on-Chip

How organisations can remain compliant while building resiliency during the AI era

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices