Microsoft AI team accidentally discloses 38 TB of private data

A Microsoft AI research team accidentally disclosed 38 TB of private data and 30,000 internal MS Teams messages. This happened while publishing a bucket of open-source training data on GitHub.

Wiz routinely scans the Internet for misconfigured storage containers to discover accidentally published cloud data. During a recent scan, security specialists came across a GitHub repository belonging to the Microsoft AI research division.

This repository normally contains open-source code and AI models for image recognition. Users can download these models via an Azure Storage URL.

Access to 38 TB of private data

Further investigation revealed that this URL does not only give access to open-source AI models. The URL gave permissions to the entire storage account, which allowed access to other Microsoft data as well.

More specifically, this case involved 38 TB of data, including the backups of two Microsoft employees. The accidentally accessible data included passwords for other Microsoft services, secret keys and more than 30,000 internal Microsoft Teams messages from 359 other Microsoft employees.

Cause in SAS tokens

According to Wiz, the cause of this unintentional data breach lies in the Microsoft employees’ use of so-called SAS tokens. SAS tokens are used to share data from Azure Storage accounts.

Here, the access level can be restricted to specific files. In this case, however, access was set up for the entire Azure Storage account, including the 38 TB of files that should not have been public.

New checks needed

According to security specialist Wiz, this case is a typical example of dangers that can arise when employees handle large amounts of (training) data for AI purposes. This raises the need for new security checks and safeguards.

Stay tuned, subscribe!

Microsoft AI team accidentally discloses 38 TB of private data

Tags in this article

Access to 38 TB of private data

Cause in SAS tokens

New checks needed

Events - Techcalendar

Red Hat Summit

RSA Conference 2024

Knowledge 2024

Top Stories

ASML: from a leaky shed to the chip industry’s key player

Snowflake enters the LLM war with introduction of Arctic

Update: IBM confirms multi-billion acquisition of HashiCorp

Review: Kingston IronKey D500S – secure USB drive with strong armour

Bug bounty in practice: the final layer of security

Recent news

Microsoft and IBM open-source MS-DOS 4.00 from 1986

Dutch CERRIX targets European expansion with new investment

Bezos and Jassy accused of deleting chats during investigation

Atlassian founder Scott Farquhar to step down as CEO

Revenue from Microsoft and Google cloud services grows significantly

Lenovo and AMD join forces and introduce a series of ‘AI-proof’ servers

Stay tuned, subscribe!

Tags in this article

Access to 38 TB of private data

Cause in SAS tokens

New checks needed

Related articles

Microsoft introduces tiny AI model Phi-3 Mini

Microsoft may be the only one with a real AI PC this summer

Appian pairs Private AI with GenAI on Amazon (AWS) Bedrock

Microsoft sees more opportunities for AI in UK and opens AI hub in London

OpenAI trained GPT-4 on millions of hours of YouTube audio

Events - Techcalendar

Red Hat Summit

RSA Conference 2024

Knowledge 2024

Top Stories

ASML: from a leaky shed to the chip industry’s key player

Snowflake enters the LLM war with introduction of Arctic

Update: IBM confirms multi-billion acquisition of HashiCorp

Review: Kingston IronKey D500S – secure USB drive with strong armour

Bug bounty in practice: the final layer of security

Recent news

Microsoft and IBM open-source MS-DOS 4.00 from 1986

Dutch CERRIX targets European expansion with new investment

Bezos and Jassy accused of deleting chats during investigation

Atlassian founder Scott Farquhar to step down as CEO

Revenue from Microsoft and Google cloud services grows significantly

Lenovo and AMD join forces and introduce a series of ‘AI-proof’ servers