Security engineer Luke Marshall analyzed more than 5.6 million public GitLab Cloud repositories for exposed secrets in a large-scale investigation.
Using TruffleHog, he identified 17,430 still-valid credentials spread across more than 2,800 organizations. The investigation built on an earlier scan of Bitbucket, where significantly fewer secrets were found despite the lower number of repositories. Marshall thus shows that GitLab contains a higher concentration of leaked data and that this problem is structural within development platforms.
5.6 million unique repositories
To systematically search the entire GitLab environment, Marshall used the public GitLab API and a Python script that retrieved all projects via pagination. The list of 5.6 million unique repositories was then processed via AWS Simple Queue Service. An AWS Lambda function took each repository from the queue, performed a TruffleHog scan, and recorded the results.
He describes that each Lambda call performed a simple scan with a set concurrency of a thousand processes. This allowed the entire operation to be completed in just over 24 hours. The total cost of the investigation was approximately $770.
The results show a clear pattern. GitLab contains nearly three times as many working secrets as Bitbucket and also has a 35 percent higher density of leaked data per repository. Most of the exposed credentials date from after 2018, but Marshall also discovered keys from 2009 that were still usable. This points to long-standing credentials that were once stored in earlier version control systems and were migrated along with them.
A large proportion of the secrets found were cloud and service-specific access keys. More than 5,000 of these were Google Cloud Platform keys. These were followed by MongoDB keys, Telegram tokens, and OpenAI keys, among others. Marshall also found more than four hundred GitLab tokens in public GitLab repositories. This is an example of what he calls platform locality. This is the tendency of developers to leak the keys of the platform they are working on to that same platform.
Nine thousand dollars in bug bounties
Informing the affected organizations was a major part of the investigation. Because the visible secrets were linked to thousands of domains, Marshall used a combination of automation, Claude Sonnet 3.7, and a Python script to find reporting procedures and contact details. And to generate appropriate notifications. Many organizations revoked their keys in response to these notifications, and the investigation earned Marshall approximately $9,000 in bug bounties.
According to Marshall, the findings underscore the need for periodic, large-scale scans for organizations that rely on open source and cloud environments. Secrets do not disappear from repository histories on their own. They pose risks for years if they are not actively cleaned up or replaced.