Online discussion platform Reddit is taking full advantage of generative AI. Deals with OpenAI and Google are set to bring in millions every year by making users’ content available for AI training. From now on, it plans to protect that revenue source by barring AI data collectors without a content licensing deal.
Reddit is going to update its own Web standard to stop automated data collection. By modifying robots.txt (aka “Robots Exclusion Protocol”), the platform will limit the number of requests a single entity can make.
Some companies will obviously still be allowed to collect plenty of data from Reddit. Those are OpenAI and Google, both of whom are in possession of an annual license to access all of Reddit’s data. No amount was mentioned in the announcement of the OpenAI deal, but Google is known to transfer $60 million a year to be allowed to use the platform’s data.
Further closure
It is another action Reddit has performed to secure its revenue sources. Earlier, it made a surprisingly successful IPO, helped by announcements surrounding the aforementioned deals with OpenAI and Google. Last year, it had already hinted at further profit optimization by putting its own API behind a pay wall. This caused all sorts of third-party Reddit apps to break up, as they would otherwise have had to pay millions to pay for API calls.
Now Reddit is trying to further protect the new business model. Only AI players with enough capital can still access Reddit’s data. At least, that’s the intention: the controversial AI company Perplexity, for example, is said to bypass robots.txt to gather data according to Wired investigations. Data scraping can lead to skyrocketing costs for the website being accessed. As a result, it’s not too surprising that Reddit wants to prevent these practices from happening. CEO Steve Huffman, for example, made that known over a year ago, even before the company’s own API became a paid offering and well before the company went public.
Also read: OpenAI strikes new deal with media companies for use of content