OpenAI’s latest model sounds like a jack of all trades. GPT-5.4 follows very closely on the heels of GPT-5.3 Instant, but mainly takes over the tasks of the more sizable GPT-5.2, particularly for tasks that require reasoning, are intended for coding, or control a computer.
The model is now available in ChatGPT, the API, and Codex. A Pro version offers “maximum performance on complex tasks” at a higher price. GPT-5.4 combines the coding capabilities of GPT-5.3-Codex with improvements in reasoning, knowledge work, and agentic workloads.
On GDPval, a benchmark that assesses knowledge work across 44 professions in nine industries, GPT-5.4 achieves a score where it performs equally or better than human professionals in 83 percent of comparisons. GPT-5.2 stood at 70.9 percent. Tasks in the benchmark include sales presentations, accounting spreadsheets, and production diagrams.
Controlling computers like a human
GPT-5.4 is the first general OpenAI model with native computer use capabilities. In that regard, the company has been remarkably slow: Anthropic already had a model that could control computers in public beta in October 2022. Like competing models, GPT-5.4 can control a computer via screenshots with mouse and keyboard commands without the need for external tools. On OSWorld-Verified, a benchmark focused on computer use tasks, GPT-5.4 achieves a success rate of 75.0 percent. That is above the human baseline of 72.4 percent and a significant jump compared to the 47.3 percent of GPT-5.2. Anthropic’s Claude Sonnet 4.6 scores 72.5, making it practically as competent as a human in the tasks tested.
Developers working with the API also benefit from tool search. Instead of always loading all tool definitions in context, the model searches for the required tool itself at the right moment. In a test with 250 tasks across 36 MCP servers, this approach reduced token usage by 47 percent with equal accuracy. This makes a direct difference in costs and speed, especially with large tool ecosystems.
Fewer hallucinations, faster coding
GPT-5.4 is also the most factually accurate model OpenAI has released to date. Individual claims are 33 percent less likely to be incorrect, and complete answers contain 18 percent fewer errors compared to GPT-5.2. Codex offers a /fast mode that delivers up to 1.5 times faster token speed with the same model.
In ChatGPT, GPT-5.4 Thinking replaces the GPT-5.2 Thinking model for Plus, Team, and Pro users. GPT-5.2 Thinking will remain available as a Legacy Model for three months, after which it will be phased out on June 5. Enterprise and Edu customers can enable early access via admin settings. GPT-5.4 Pro is available for Pro and Enterprise plans. In the API, the model is available as gpt-5.4, with a higher token price than GPT-5.2.
How long will the coin toss continue?
Since the emergence of reasoning models around the end of 2024, the major AI model makers have fallen into a fixed pattern. Google, Anthropic, and OpenAI in particular are constantly competing for the best benchmarks, while open-source options (especially from China) continue to lag behind in terms of performance. DeepSeek-R1 in early 2025 and Claude Cowork temporarily shook up the market, but apart from that, the trajectory of AI models has become predictable.
This means that after the release of the Gemini 3.1 and Claude 4.6 models, GPT-5.4 is a way for OpenAI to perform slightly better in the tests. It does not seem to be a breakthrough, except for users who only use the GPT models. Native computer use and more flexible token deployment make 5.4 a release for refinement. The streamlining of the architecture around AI models continues apace, now that large-scale feature parity has been achieved. In terms of benchmark percentages, OpenAI is back on top, but probably not for too long.
Read also: Google launches Gemini 3.1 Pro, an LLM for complex reasoning