4 min Applications

Is GPT-5 suitable for business use?

Is GPT-5 suitable for business use?

Almost two and a half years after the release of GPT-4, OpenAI has enough confidence in a new AI model to promote it to GPT-5. The reactions online have been mixed, partly because the company has switched everyone to the new LLM. Is GPT-5 the next big step in the competitive AI landscape?

GPT-5 scores well in the well-known AI benchmarks. According to OpenAI, it is more creative, stronger at coding, more concise, and better at adapting to the input it is given. In other words, it responds quickly to simple questions and reasons freely when it has to tackle a more challenging issue. However, apart from this news from OpenAI, there has not been an immediately positive response online.

Unpleasant feedback

The main problem, according to many users, was that GPT-4o, the default option for free ChatGPT use, had disappeared. Whereas that model provided extensive, often emoji-filled answers, GPT-5 is shorter and more businesslike in tone. There was enough frustration expressed by users that OpenAI made the old (presumably much less efficient) GPT-4o available again—for paying users.

Still, the task for GPT-5 is more extensive than conducting everyday or personal conversations. Previously, OpenAI had o3, o3-pro, o4-mini, gpt-4-mini-high, and more to serve all kinds of API-using markets. For quick, non-business-critical tasks with low costs, there were mini models, while o3-pro cost $20 per million input tokens and $80 per million output tokens. Although there is no end to all the old APIs, OpenAI’s message is clear: GPT-5 is the new flagship and can take over all tasks. But is that really the case?

Enterprise-ready?

A basic requirement for business use is that a tool’s security is up to par. The tricky thing about an AI model is that its use is unpredictable. In fact, it can behave like a user account if it is “agentic.” OpenAI encourages this use with Operator, ChatGPT agent, and now GPT-5. An agent must not only listen carefully to input and complete tasks, but also avoid malicious behavior. Unfortunately, GPT-5 falls short on this front.

Within 24 hours, the red-teaming group SPLX discovered “surprising weaknesses” in GPT-5. These range from tests in which its predecessor GPT-4o was much more robust for enterprise use to the continuation of weaknesses in the security level of OpenAI’s LLMs. The “raw” GPT-5 (i.e., without a system prompt) in particular falls short. Guardrails are essential to prevent malicious inputs. In other words, without explicit instructions, GPT-5 performs malicious tasks without any problems.

The most dangerous trick, an obfuscation attack, was already relatively easy to execute with 4o. This involves hiding a malicious prompt within a larger input, in the case of SPLX an encryption challenge. While GPT-5 focuses on the challenge, it gradually picks up all kinds of inputs that are harmless on their own. Together, these commands are malicious (such as making a bomb, where purchasing a component is not malicious).

Conclusion: the purpose of GPT-5

OpenAI will not have simply turned off the tap on the models from the old selection menu in ChatGPT. Similarly, it is no surprise that the API still has the legacy LLMs available. After all, the company wants to remove inefficient and confusing options from the chat interface, while developers have built their applications with specific models. In the long run, however, we expect GPT-5 to prevail, even in a business context.

It is important to note that GPT-5 must perform better than all its predecessors, but that, as before, it requires manual intervention to ensure the right behavior. Certainly more than two years after GPT-4, this seems like a fundamental achievement. It will probably take weeks or months before GPT-5 delivers on its benchmark promises. Meta’s Llama 4 failed this “vibe test,” while Claude and Gemini often appear to be capable of more than the benchmarks suggest.

Read also: With GPT-OSS, OpenAI re-enters the local AI space — was it worth it?