3 min

Tags in this article

, ,

The New York Times has accued OpenAI of copyright violations. To substantiate these allegations, the news organization allegedly had ChatGPT hacked. An outside party caused the chatbot to reproduce copyrighted information, OpenAI claims. Not everyone agrees with them.

The fact that ChatGPT is trained on large parts of the Internet is no secret. OpenAI claims that the chatbot is not meant to take information literally. Similar to humans, ChatGPT acquires knowledge from sources such as The New York Times, Reddit, Wikipedia and elsewhere – at least, that’s how the AI company characterizes the AI training process on data. Because the AI answers almost always deviate from copyrighted material, OpenAI sees the use of this external information as “fair use”.

Tip: Generative AI faces existential crisis over copyright concerns

The New York Times (NYT) disagrees with this definition of “fair use” and in December filed a lawsuit against OpenAI and its major backer Microsoft. Crucially, NYT was able to have the chatbot reproduce paragraphs of its own articles in full, a clear proof of copyright infringement, according to the news organization’s lawyers. However, this evidence was not simply offered by the chatbot of its own volition, OpenAI argues. An external hacker allegedly made “tens of thousands of attempts” to arrive at “highly anomalous results.”

OpenAI appears to view prompt engineering as hacking

According to OpenAI, NYT only managed to get these results by exploiting a ChatGPT bug. Misleading prompts are said to have been used to bypass existing “guardrails” that keep the chatbot from IP violations in its outputs. A well-intentioned person abiding by the terms of use could never expect an output that indiscriminately submits copyrighted material, OpenAI says.

This issue exposes a fundamental problem with LLMs: they are simply hard to keep on the straight and narrow. In some cases, they don’t even need help to do that. For example, Google recently caused a stir when it inadvertently generated almost exclusively non-Caucasian people (even when it was highly inappropriate from a historical context). Also, a week ago, ChatGPT delivered such curious answers that, according to users, the chatbot seemed to have had a stroke.

Even when AI works as intended, prompt engineering counts as an important tool to get desired responses consistently. With this, developers and others seek to ensure that an AI model, for example, formulates outputs more accurately or uses particular wording to suit a specific use-case. Those hoping to accomplish this in a more comprehensive way eventually turn to AI fine-tuning, which can enable business-specific outcomes with proprietary sources.

The point here is that there’s a spectrum of adjusting prompts to deliver specific outputs: even the off-the-shelf ChatGPT can be tricked into certain behaviour, oftentimes sidestepping OpenAI’s guardrails (perhaps even unintentionally so). Examples include the generation of an article with a professional tone or programming assistance aimed at one specific language and application. Such practices make AI assistants significantly more useful, which makes it rather striking to see that OpenAI describes a similar approach as “hacking.” At worst, it’s exploiting a loophole that the company hasn’t closed yet. Ultimately, the (NYT-affiliated or otherwise) “hacker” was simply trying to generate a response that served to fulfill a specific desire.

“Bizarre allegations”

Speaking to The Register, attorney at Susman Godfrey Ian Crosby, who is assisting NYT, speaks of “bizarre allegations” from OpenAI. “What OpenAI bizarrely mischaracterizes as ‘hacking’ is simply using OpenAI’s products to look for evidence that they stole and reproduced The Times’s copyrighted works,” Crosby said. “And that is exactly what we found. In fact, the scale of OpenAI’s copying is much larger than the 100-plus examples set forth in the complaint.”

Whether more charges from other news organizations will follow is uncertain. However, OpenAI has already struck deals with Axel Springer (the publisher of Politico, among others) and the Associated Press. It is not expected that The New York Times will sign a similar contract any time soon.