Google researchers in the Euphonia project, a speech-to-text service for people with speech impediments, together with DeepMind researchers recreated the voice of an American NFL player.
In August, Google AI researchers working with the ALS Therapy Development Institute shared details about Project Euphonia, a speech-to-text transcription service for speech-impaired people. They showed that they could dramatically improve the quality of speech synthesis and generation. This was possible using data sets of both native and non-native speakers with neurodegenerative diseases, combined with techniques from Parrotron, an AI tool for people with a speech disorder.
The Google researchers, together with a team from DeepMind (also a subsidiary of Alphabet), used Euphonia to recreate the voice of Tim Shaw, a former NFL player who played for the Carolina Panthers, Jacksonville Jaguars, Chicago Bears and Tennessee Titans. Shaw was diagnosed with ALS, because of which he has to use a wheelchair, and is unable to talk, swallow or breathe without help.
WaveNet model very efficient
In about six months, the research team adapted a generative AI model called WaveNet for the task of creating speech from samples of Shaw’s voice of when he wasn’t already affected by his illness.
WaveNet mimics emotions such as stress and adjusts intonation by identifying certain tone patterns in speech. The technique produces much more convincing voice fragments than previous models – Google itself reports that it has already closed the quality gap with human speech by 70% when looking at the average. The technique is also more efficient because it works on Google tensor processing units (TPUs), customised chips with circuits optimised for AI model training. It only takes 50 milliseconds to build a one-second speech sample.