Amazon develops more natural-sounding text-to-speech model Base TTS

Amazon researchers recently published details of a new text-to-speech model called Base TTS. This model is said to make words sound more natural than previous neural networks.

The paper shows that the new Base TTS model is the largest neural network to date in the category of text-to-speech models. The most advanced version of the model is based on about 1 billion parameters. The larger the number of parameters, the more tasks an AI model is expected to be able to perform.

The Base TTS model was trained on 100,000 hours of audio files publicly available on the Internet. About 90 per cent of these audio clips were in English.

Better quality pronunciation

The model, the researchers further point out, improves the pronunciation quality of words compared to previous text-to-speech models. Evaluation of the model by linguists would have shown that the Base TTS model would successfully pronounce, for example, the “@ sign” and other symbols, paralinguistic sounds such as “shh”.

The model also pronounced loud English sentences containing foreign words and questions. The model would have accomplished these tasks even though it was not explicitly trained on different types of sentences applied in the evaluation dataset.

Two AI models

Amazon’s text-to-speech model runs on two separate AI models. Based on the Transformer architecture that again has GPT-4 as its foundation, the first model transforms the input text into abstract mathematical “representations” or “speech codecs. This model also additionally compresses the speech codecs, making processing faster. Also, this model helps keep unwanted elements, such as background noise, out of the final Base TTS audio.

The second neural network then transforms these speech codecs into audio. It does this by converting data into spectrograms. These are graphs used to visualize sound waves. These graphs can be easily converted into AI-generated speech.

Tip: New OpenAI model Sora can generate videos

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Tech calendar

Amazon develops more natural-sounding text-to-speech model Base TTS

Better quality pronunciation

Two AI models

Stay tuned, subscribe!

Many roads lead to Oracle: the routes taken by VTTI and Hendrix Genetics

HPE OpsRamp plays a very important role in the platform

Is English the next programming language? JetBrains’ CEO says no

IFS acquires TheLoops: AI agents for critical industries

30% of Salesforce’s work is done by AI: what does that mean?

Autonomous AI agents only work with the right ingredients

AI agents need Process Intelligence

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices