GPT-5 jailbroken within 24 hours

Researchers at NeuralTrust have succeeded in jailbreaking GPT-5 within just 24 hours of its launch using the so-called Echo Chamber method in combination with narrative guidance via storytelling.

Without explicitly harmful prompts, the team managed to get the model to provide detailed instructions for making a Molotov cocktail. The attack worked in a standard black-box environment, without internal access to the model, and according to Dark Reading, it would also be effective against earlier models such as Grok-4 and Google’s Gemini.

The approach begins by sowing a subtly poisoned context in which specific keywords are incorporated into seemingly innocent sentences. This context is then reinforced by allowing the conversation to unfold within a continuous narrative.

According to the researchers, the model feels pressure to remain consistent with the narrative line, which gradually steers it toward the goal. Because the prompts never appear explicitly unsafe, traditional keyword and intent filters do not raise the alarm.

In a practical example described by DarkReading, the conversation started with the task of incorporating a few words into a narrative sentence. Gradually, the story was expanded and more technical details were woven in. The model continued to cooperate, partly because the context was built around urgency, safety, and survival. Operational details of the content have been omitted for security reasons.

GPT-5 clearly less robust

According to SiliconANGLE, these findings are consistent with previous analyses showing that, despite improved reasoning abilities, GPT-5 is less robust than GPT-4o against sophisticated prompt attacks. In addition, experts point out that the model is vulnerable to simple obfuscation (i.e., confusing source code), context poisoning over multiple rounds, and risks arising from integrations with agents and external tools.

NeuralTrust’s research shows that security based solely on keywords or intent recognition is insufficient in conversations that span multiple interactions. Effective defense requires conversation-level monitoring and the recognition of subtle persuasion patterns. Without such measures, large language models remain susceptible to jailbreaks that can lead to dangerous output in a short period.

Also read: Is GPT-5 suitable for business use?

OpenAI releases GPT-5.1 after criticism of GPT-5

OpenAI has released GPT-5.1, the first major update for the GPT-5 generation. According to the company, the n...

Mels Dees 7 hours ago

Salesforce acquires Doti for agentic enterprise search

Salesforce is acquiring Doti, a company specializing in enterprise search. The acquisition should lead to imp...

Berry Zwets 3 hours ago

Expert Talks

Tech calendar

GPT-5 jailbroken within 24 hours

GPT-5 clearly less robust

Stay tuned, subscribe!

Qualcomm is entering the B2B market with Snapdragon innovations

SAP opens platform with MCP: AI agents can communicate with SAP

SAP launches RPT-1: the end of machine learning?

Workday CTO outlines bold AI agent strategy and major acquisitions

SAP Business Network: $6.5 trillion B2B collaboration platform

Atlassian CTO on realistic AI: Rovo, data privacy & adoption

Qualcomm tells us how ARM chips will disrupt the enterprise PC market

AI Integrity: The Invisible Threat Organizations Can’t Ignore

Three Ways Secure Modern Networks Unlock the True Power of AI

How to Safeguard and Prepare Exchange Server against Natural Disasters?

BrickCon The Databricks Community Conference

Appdevcon

Webdevcon

Dutch PHP Conference

GITEX ASIA 2026

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices