2 min

Tags in this article

, , , , ,

A Microsoft AI research team accidentally disclosed 38 TB of private data and 30,000 internal MS Teams messages. This happened while publishing a bucket of open-source training data on GitHub.

Wiz routinely scans the Internet for misconfigured storage containers to discover accidentally published cloud data. During a recent scan, security specialists came across a GitHub repository belonging to the Microsoft AI research division.

This repository normally contains open-source code and AI models for image recognition. Users can download these models via an Azure Storage URL.

Access to 38 TB of private data

Further investigation revealed that this URL does not only give access to open-source AI models. The URL gave permissions to the entire storage account, which allowed access to other Microsoft data as well.

More specifically, this case involved 38 TB of data, including the backups of two Microsoft employees. The accidentally accessible data included passwords for other Microsoft services, secret keys and more than 30,000 internal Microsoft Teams messages from 359 other Microsoft employees.

Cause in SAS tokens

According to Wiz, the cause of this unintentional data breach lies in the Microsoft employees’ use of so-called SAS tokens. SAS tokens are used to share data from Azure Storage accounts.

Here, the access level can be restricted to specific files. In this case, however, access was set up for the entire Azure Storage account, including the 38 TB of files that should not have been public.

New checks needed

According to security specialist Wiz, this case is a typical example of dangers that can arise when employees handle large amounts of (training) data for AI purposes. This raises the need for new security checks and safeguards.