2 min

ChatGPT Voice is now available to all users of the Android and iOS app. With it, different AI models work together to deliver the most natural conversation possible with the chatbot.

Amid all the chaos surrounding the company, OpenAI has found time to announce the general availability of ChatGPT Voice. President and co-founder Greg Brockman announced on X that the AI tool will no longer be hidden behind the paid services ChatGPT Plus and Enterprise from now on. In its video sample, the company pokes fun at the unrest that has dominated at OpenAI’s HQ since last weekend by asking the chatbot how many 16-inch pizzas one should order for 778 people.

Multiple models work together

In a practical sense, interaction with ChatGPT Voice is similar to the conventional text version. However, much more takes place behind the scenes than “just” running the GPT-4 LLM. Voice support is made possible by a text-to-speech model introduced by OpenAI in September, which (however subjectively) offers a particularly true-to-life human voice.

There are multiple speakers to choose from, which base their generated utterances on training data from voice actors.

Another model is in play to convert the user’s voice to text: Whisper, an open-source speech recognition system. Fundamentally, then, the inputs and outputs are no different from a text-based conversation with ChatGPT, however variable they may be. Still, for end users, the capability may lead to a different, more free-form use of the chatbot thanks to the option to use one’s own voice. OpenAI itself previously provided widely varying examples of what’s possible with ChatGPT Voice, in addition to other capabilities such as image-based interaction.

“Breakthrough” in AI development

For years, numerous companies have been trying to make AI voices as believable as possible, but for a long time there’s not been a breakthrough. However, several incidents earlier this year showed that realistic audio-based deepfakes can also be created. For example, a Slovak politician appeared to be voicing his support for election fraud in a leaked audio clip, but the voice in question turned out to be AI-generated. Given the emphasis that OpenAI CEO Sam Altman in particular has placed on the security and safeguards required for AI development, it is not surprising that only the voice recognition system has been made open-source. The model that generates AI voices is not likely to be disclosed for fear of deepfakes.

Whereas the company itself made the announcement of ChatGPT Voice, there was a leak elsewhere of an even more striking development. The much-discussed resignation of CEO Sam Altman (who has since been reinstated) allegedly occurred shortly after a discovery by the board of directors. Specifically, a letter from certain OpenAI employees spoke of a major breakthrough in AI. The technology would be powerful enough to be a danger to humanity. At least in that regard, ChatGPT Voice itself does not seem to be all that much of a threat.

Also read: Secret Q* may be OpenAI’s breakthrough to AI with human intelligence