Microsoft API identifies security bugs, almost error-free

Microsoft has built an AI model that would detect the difference between security bugs and normal bugs with 99 percent accuracy. In the coming months, Microsoft plans to release the system, open-source, on GitHub.

In addition to the fact that the system distinguishes almost perfectly between security bugs and normal bugs, the AI also identifies critical high-priority security issues in 97 percent of all cases. The system was trained using a dataset of 13 million work items and bugs from 47,000 Microsoft developers stored in AzureDevOps and GitHub repositories. The model first learned to classify the difference between security bugs and normal bugs. Then the AI learned to apply labels – low-impact, important, and critical – to the security bugs.

The AI could be used to support human experts. Coralogix estimates that developers create 70 bugs per 1000 lines of code and that solving one bug takes thirty times longer to write a line of code. In the United States alone, 113 billion dollars a year is spent on identifying and fixing product defects.

How does the model work?

Microsoft says that the model is being put into production internally and that it is being continuously upgraded with data approved by security experts. They monitor the number of bugs generated in software development. “Every day, software developers stare at a long list of features and bugs that need to be addressed. Security professionals try to help by using automated tools to prioritize security bugs, but too often engineers waste time on false positives or miss a critical vulnerability that has been misclassified,” said Microsoft Senior Security Program Manager Scott Christiansen and Microsoft Data and Applied Scientist Mayana Pereira in a blog post. “We discovered that by linking machine learning models to security experts, we can significantly improve the identification and classification of security bugs.”

Microsoft’s model uses two techniques to make bug predictions. The first technique is a ‘term frequency-inverse document frequency algorithm’ (TF-IDF), an approach to retrieving information that attaches importance to a word based on the number of times it appears in a document and checks how relevant the word is in a collection of titles. The second technique involves a logistic regression model, and uses a logistic function to model the probability of a particular class or event.

Top story

A Ferrari needs brakes, innovation needs cybersecurity

Practical insights into managed security from the field

Sander Almekinders 1 day ago

Tech calendar

Microsoft API identifies security bugs, almost error-free

How does the model work?

Stay tuned, subscribe!

HPE can finally take over Juniper after settling with the US government

What is HPE VME and is it a direct competitor to VMware’s hypervisor?

Inside TCS’ digital race behind Formula E

SAS gives data scientists the steering wheel for the AI (agents) era

SAS launches tailor-made AI models for business processes

What is the new AI project Red Hat InstructLab?

After the AI world, Databricks now wants to change the analytics market

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices