OpenAI is giving ChatGPT a “realistic” voice. This is the company’s second chance after the first voice assistant was withdrawn due to criticism.
Advanced Voice Mode has recently become available. The feature gives ChatGPT a voice and can read the AI assistant’s responses to users. The LLM where the voice supports is GPT-4o.
The new voice feature is more advanced in that GPT-4o can combine multiple tasks into a single model (multimodal), generating output faster and thus making the voice sound more natural. The available Voice Mode in the AI tool requires three models for speaking: one to convert your voice to text, one to process the message and a final one to return the text to speech.
Given that this is an advanced mode, voice will only be available to paying users. In the fall of 2024, all Plus users of the AI tool will get the voice feature. The recent rollout was made only to a limited group from the pool of Plus users, the alpha group.
New attempt
The launch of GPT-4o was supposed to be a total package in which ChatGPT got voice for the first time. The o in the name refers to that and stands for “Omnimodel.” OpenAI also initially managed to pull that off.
Five voices-Sky, Breeze, Cove, Juniper, Ember-were launched. Before the rollout could be finalized, OpenAI decided to withdraw voice Sky. The reason was an accusation by actress Scarlett Johansson about copying her voice even though she explicitly did not give permission. This caused displeasure among users of the AI tool, who found Sky to be by far the most “mature and intelligent” sounding voice.
OpenAI is not launching a new voice for the advanced option. Breeze, Cove, Juniper, and Ember remain the only available voices.
Limitations
The rollout of Advanced Voice Mode now is happening much more cautiously. GPT-4o immediately rolled out to all users, including non-paying ones, after its announcement in May. OpenAI is now choosing to let a limited group experiment, but that, too, comes with limitations. For example, video and screen-sharing options are not yet available. These features OpenAI showed in May make the chatbot capable of viewing live images and functioning as an interpreter.
Also read: OpenAI makes mini version of powerful GPT-4o available