OpenAI has made extensive use of forums like Reddit to train its AI models in the past. Now, it is closing a deal with Reddit to officially collaborate — and pay for the content its users create. Previously, a Stack Overflow deal showed users are far from happy with such a deal. However, it shows the lucrative future of AI training data monetization, something forum contributors simply have no say in.
In pursuit of profitability, Reddit decided to price the previously free API last year. One of the arguments for this was that AI companies were only too happy to scrape (“scraping”) all the data from the platform, including OpenAI. For example, ChatGPT was trained in part with Reddit data, something Reddit itself did not know about at the time.
Even stronger integration than with Google
The now publicly traded Reddit chose to change course and shift toward making AI deals. In February, it licensed its own content, generated entirely by visitors (“Redditors”), to Google. It is set to generate about $60 million a year. Now OpenAI is also in town with both a content deal and an integration. Reddit data won’t only be used for AI training as well, as Reddit posts may also appear with ChatGPT answers through an integration with Reddit’s Data API.
The deal between Reddit and OpenAI thus differs from previous content deals. OpenAI itself has already snared quite a few companies to use their data for training. Last year, for example, a deal was struck with The Associated Press, while this year developer forum Stack Overflow also signed a contract with OpenAI.
Revolt among users has virtually no effect
When Reddit decided to add a price tag to its API, much of the social media platform went dark in response. The change made third-party alternatives to the Reddit app no longer financially viable, such as Apollo, Reddit Is Fun and Relay. Where one app may have opted for a pricey subscription service, the other simply decided to fold.
Reddit didn’t care, as its CEO Steve Huffman emphasized the need to finally turn a profit. Avid users, who stressed that they were the engine behind Reddit’s success by generating content, didn’t get their way in pretty much any regard. The company’s executive arm did try to charm its users by inviting them to secure Reddit shares early in the IPO. That was pretty much it.
We doubted Reddit’s success on the stock market, but this seems to have been a misconception for now. Indeed, the AI deals are giving the company a significant boost. One can doubt the sustainability of this (after all, there are not that many big AI players). Still, the IPO has been lucrative both because of the increase in value as well as the revenue from the deals themselves. In addition, Google is set to cough up tens of millions every year while OpenAI may be doing the same.
Platform change
Reddit can rest on nearly two decades of user-generated content. Countless discussions on every topic under the sun can inform ChatGPT about how internet users view any subject in existence. The accuracy of that content will still have to be checked by OpenAI itself, but the company doesn’t let third parties look under the hood that way these days. We can’t be sure if there’s any meaningful checks and balances there, but given the fact AI model performance improves as the dataset quality rises, there likely are.
Information from Stack Overflow may be easier to put to good use. After all, this highly popular development forum is specifically focused on answering specific developer questions, with the best answers floating up democratically. Unlike Reddit, the driver for highly rated posts is a lot more closely tied to the answer being factual, not whether it’s comical, interesting or otherwise attractive.
However, Stack Overflow users also rebelled against the platform they’re using. Why let AI benefit from your knowledge when that same AI could eventually replace you? Among programmers, AI code generation is a well-known bogeyman, although it remains to be seen how well GenAI applications generate safe and reliable code, especially when dealing with a complex problem. The jury’s still out whether or not AI is a persistent threat to their job security long-term, but the Stack Overflow users aren’t leaving that up to chance.
Stack Overflow protest
Stack Overflow visitors tried to remove their old posts en masse. This was met by Stack Overflow with refusals to remove the content or even forum bans. The argument: other developers found your contributions valuable, so we’ll keep it up whether you want it to or not. The catch is that the Stack Overflow archives remain as suitable as possible for AI training. Incidentally, it is questionable whether deleting the posts would have had any effect at all to keep them away from AI use, as an old snapshot might well be present internally with all the answers up to a certain point in time.
Either way, it’s a recurring pattern that other forums might also experience as other companies strike deals with Google or OpenAI. In addition, the uprisings appear to be either of a temporary nature or too small in scale to amount to anything. This has allowed Reddit and Stack Overflow to secure new revenue through AI deals. Since the terms of use almost always stipulate that user-generated content is not user-owned content, the only option is to move to an alternative forum. History shows that almost no one does. Plus, that alternative might also get a payout from AI giants in time.
Also read: Deal between Apple and OpenAI to integrate ChatGPT into iOS 18