2 min Analytics

Translate and transcribe in nearly 100 languages with Meta’s new AI model

Translate and transcribe in nearly 100 languages with Meta’s new AI model

Meta, the parent company of Facebook, Instagram and WhatsApp, has developed a new AI model and donated it to the open-source community. Called SeamlessM4T, the model allows text and speech to be transcribed and translated into nearly 100 languages. According to Meta, it is a breakthrough that can take translation and transcription tools to a much higher level.

With SeamlessM4T, Meta says great strides can be made in speech-to-speech and speech-to-text. Surely, the most essential breakthrough with this model is that it’s a single model for nearly 100 languages. The model can detect and convert the language without accessing another model. This makes for lightning-fast translations, allowing people to communicate in different languages in real-time.

Meta seems to have a very powerful model on its hands, and it is extraordinary that it is making this open-source. Many other tech giants are also working on similar solutions. For example, Google is working on the Universal Speech Model, which should eventually support 1,000 languages. Amazon and Microsoft are not lagging in this area either, all eventually offer translation services that are frequently used.

With SeamlessM4T, Meta has a successor to the No Language Left Behind project we wrote about last year. The goal then was to develop a language model with minimal input. So now there is a single model for all languages.

Data sources of model

It is still unclear what data Meta exactly used to develop its model. It claims it is publicly available data. Techcrunch has asked questions about it, but did not receive a clear answer. Meta did say it used about tens of billions of sentences and more than 4 million hours of speech from the Internet. Several lawsuits are already pending because content creators are not happy that their material is being used to develop models that end up in commercial products. That means a third party is making money from their creation.

Meta did clarify that it did not use copyrighted material. That it is mainly open-source and licensed sources.