For seven months, attackers had free reign to wreak havoc inside the AI infrastructure of major tech companies. An exploitation of vulnerability CVE-2023-48022 in the widely used open-source Ray framework has led to manipulated models, stolen hardware cycles and compromised data. Its developer Anyscale, however, says there’s no vulnerability to speak of: the responsibility for protecting against “ShadowRay” lies with the companies that use Ray themselves, according to the software maker.
Thousands of public Ray servers have been affected by the cyberattacks, Oligo’s research team reports. Companies using this framework are advised to extensively monitor their IT environments for suspicious activity. The affected parties are prominent: Oligo notes that Amazon, OpenAI, Uber, Spotify, Netflix and LinkedIn, among others, use Ray and may have been affected.
The Ray framework is considered the industry standard for scaling AI and Python workloads. Oligo describes it as the “Swiss army knife” for Python users and AI developers. Since fine-tuning and training of advanced AI models is not possible on a single machine, companies use Ray clusters to spread the myriad computations needed across multiple devices. For this application, Ray constantly accesses sensitive data, which frequently resides in Amazon S3 buckets or within its own corporate environment.
Tip: What does a Sovereign AI Infrastructure entail?
Anyscale doesn’t call it a vulnerability
CVE-2023-48022 has been known for some time and received a CVSS score of 9.8 out of 10. Nevertheless, Anyscale does not consider it a vulnerability. In late 2023, security researchers discovered five ways in which Ray can be exploited, but in the case of CVE-2023-48022, Anyscale states that its functionality is a feature and not a bug. The other four vulnerabilities have since been patched.
Because the framework does not require authorization to access, malicious parties can remotely execute code. This should never happen, according to Anyscale, because Ray is not meant to be easily accessed from the public Internet. However, there were still thousands of Ray servers online unprotected, so it appears many users of Ray didn’t quite get that memo. It seems that AI developers were simply unaware of the danger. It explains why Oligo called this vulnerability ShadowRay, as it has long flown under the radar.
Also read: ‘Increasing number of secrets leak in public GitHub repositories’
Oligo says this exploitation campaign is the first of its kind. Never before has AI infrastructure been found susceptible to a widespread attack like this. Because ShadowRay isn’t technically a vulnerability according to its developer, many security scans won’t recognise it as such. Only after Oligo shared its findings did Anyscale offer assistance in detecting susceptible ports. The researchers stress that AI experts are not security experts, so they themselves will rarely realize what the danger is. This could still be the case: after all, a fully patched Ray framework (2.10.0) is still vulnerable to ShadowRay without a change to the default configuration.
IT complexity extends to AI
The meteoric rise of AI is driving organizations to further expand their own IT environments. That increase in complexity creates new potential dangers, with AI in particular quickly creating new risks. Those who do not allow their own generative AI workloads in a secure environment may face employees who handle proprietary information in an unsafe manner by using the likes of ChatGPT or other public chatbots.
AI software with security dangers is not unique. For example, Nvidia’s ChatRTX chatbot today was found to be susceptible to multiple critical vulnerabilities, also potentially resulting in remote code execution. However, the Ray framework is a lot less niche than that solution. It is a fundamental component within the IT environments of prominent AI development teams. The fact that criminals could steal data and manipulate data unseen for months is concerning.
However, the exact damage is difficult to quantify. Oligo does attempt to do so by indicating that attackers were able to steal hardware cycles for their nefarious purposes. Threat actors were able to take over multiple machines worth $858,480 per year. Affected Ray clusters were mostly deployed for cryptomining, although the attackers may have had other purposes in mind depending on the targeted organization.
Clear rules
We are not entirely convinced of the relevance of the numbers Oligo uses to assess the damage. In the end, it is more important to emphasize that the cybercriminals were able to remain active for months and cause all kinds of damage. Exactly what they accomplished should be made clear by the data logs. Cryptomining requires communication with a specific server, so evidence for mining can be found after the fact.
Either way, the incident shows that AI workloads need to be protected just as strongly as other ones. One need only look at the data that attackers can capture by exploiting the Ray framework. Theft of credentials, passwords, private SSH keys, tokens for OpenAI and HuggingFace accounts and access to KubernetesAPI can all cause greater problems and lateral movement in an enterprise network.
Oligo recommends a number of mitigation steps to secure a Ray deployment. First, Ray should run in a secure trusted environment and firewall rules and/or security groups should be able to block access. In addition, the Ray Dashboard port (8265 by default) must have an authorization layer with a proxy to the Ray API. Furthermore, ongoing security scans and following other best practices are essential.
Also read: ChatGPT plugins leak sensitive data