4 min Applications

OpenAI squabbles with DeepSeek: the pot calling the kettle black

OpenAI squabbles with DeepSeek: the pot calling the kettle black

To make ChatGPT a reality, OpenAI trained its models on virtually every corner of the public Internet. Now, its own intellectual property appears to have been used to craft a competitor: DeepSeek-R1. OpenAI CEO Sam Altman says he welcomes the competition, but other signs point to a more aggressive stance.

DeepSeek’s V3 model, which formed the basis for R1, was reportedly trained for just over $5 million. This news caused a major shock among AI companies, especially those based in the United States. OpenAI in particular believed that with o1 it had a far more powerful LLM than any rival, leaving competitors little room to catch up. Now it seems the situation has changed.

Privacy concerns

There are plenty of reasons to be concerned, and OpenAI, Microsoft, and U.S. competitors have additional comments about DeepSeek’s sudden popularity. For instance, an API developer at OpenAI complained that Americans are all too ready to hand over their data to the Chinese Communist Party (CCP). This is a valid concern, but at the same time users have already compromised their data privacy on a massive scale through their interactions with ChatGPT as well as its rivals Gemini and Claude. Perhaps it feels slightly less grim when one shares data with a Western company, but the reality remains that data collection is key for all major AI players. The twist is that, with a sufficiently powerful server, you can run DeepSeek’s largest model entirely on-premises without ever connecting to the Internet. So who is really the champion of open AI and privacy here?

Some observers have gone further, suggesting the possibility of psychological warfare from Beijing, fueled by the fact that DeepSeek-R1 avoids topics controversial to the Chinese regime in its unmodified form. It also remains firm on the “One China” policy, claiming Taiwan belongs to the People’s Republic of China. Open-source developers are trying to remove this bias, with mixed results. The AI search engine Perplexity already uses R1, where the model speaks much more neutrally about such issues. IBM has also embraced DeepSeek-R1, offering the LLM via watsonx.ai. R1 is evidently not dangerous enough to be avoided outright by US-based companies.

Hypocritical

The DeepSeek team has been accused of not being transparent about its development process. For example, the real compute cost for R1 is rumored to be much higher than the $5M+ figure mentioned, and nowhere is the R&D budget disclosed. The training method is OpenAI’s biggest point of contention, since DeepSeek-R1 may have actually learned from ChatGPT/o1. By means of “distillation,” R1 may have been able to replicate o1’s workings inside its smaller model. This has already been done out in the open, too: DeepSeek has models based on Alibaba’s Qwen and Meta’s Llama that reason similarly to R1 but with fewer parameters. Their end results aren’t as good, however, so you’ll still want to run the largest size you can get away with.

OpenAI says it is aware that Chinese companies, among others, continuously tap into GPT-4o and o1’s APIs to distill the models. The American firm is trying to shield itself from this and now wants to prevent potentially hostile states from accessing AI technology.

Focusing on China does have some justification. After all, Western businesses do not want AI models that present incomplete or politically slanted information. Having China’s AI players emerge as competitors means this risk becomes clear. However, closed-source LLMs from the US cannot guarante that their AI outputs always “think” in the way their customers desire either. That notion is based entirely on trust. Moreover, as noted earlier, DeepSeek-R1 is open-source, so a Western equivalent could well emerge in time that doesn’t parrot the CCP (as Chinese law dictates, by the way, so DeepSeek is just being compliant here).

OpenAI has no choice but to protect itself. Its revenue model is under pressure, partly due to the many API calls DeepSeek and others apparently make. Yet OpenAI itself has profited enormously from the lucrative APIs of Twitter and Reddit, among others, to gather vast amounts of training data. That was done discreetly but not illegally, but much of the data OpenAI collected was also subject to copyright. Rather than asking permission to use the data for training, it sometimes made deals with media organizations after the fact and in other cases suffered legal action, the end results of which we have yet to see. Intellectual property is not exactly a subject OpenAI can defend without a heavy dose of irony.

Also read: DeepSeek, hot on OpenAI’s heels, hit by cyberattack