Meta AI proudly announces its translation model, Seamless Communication. However, models to translate spoken or written texts have been known for some time. So, what else can this model do?
Seamless Communication amalgamates three existing models: SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2. It allows the model to translate and, thanks to SeamlessExpressive, also pays attention to the speaker’s non-verbal communication.
The translation is delivered in spoken form to preserve nonverbal communication. The model can already deploy its functions in more than a hundred languages.
The added value of Seamless Communication lies in its ability to capture and translate nonverbal communication. This gives the output of the language model a more realistic character. According to Meta AI, this feature makes the language model more suitable as an interpreter. “Human speech and translation are sensitive to nuances such as responding to a conversation and sensing the right timing,” he said.
The model works for both written and spoken conversations. Moreover, the translator works quickly, as Meta AI promises to have only a few seconds of delay between the spoken text and the translation. That makes the model SeamlessStreaming possible.
Translator and more
The researchers hope SeamLess Communication can assist with more jobs than only this of translators. For example, it can serve as a simple way to release a podcast series in other languages as well, without the speaker having to make the recording twice to do so. The same goes for video, of course.
The researchers do see that the technology can also be used for deceptive tricks. For example, phishing calls are now easy to spot by the caller’s often poor language knowledge. That changes with a language model that translates quickly and, for example, leaves pauses in the conversation.
Open for development
The researchers have now opened up SeamLess Communication for further development. That allows developers to build suitable tools that do justice to the capabilities of the language model. The models are available via Hugging Face and Github.