Zoom introduces Auto-Generated Captions. Users of Zoom’s paid and free services can enable the real-time conversion of meeting speech to subtitle text. As it stands, the feature is solely available for English speakers.
Auto-Generated Captions can be enabled per user, user group or company account. Zoom published a document with detailed guides for doing so. After toggling the new option, users can display the speech of meetings as subtitled text. Zoom emphasizes that the addition makes its platform more accessible to the hearing impaired.
For now, the feature is exclusively available for English speakers and captions. While we expect support for additional languages shortly, said support is yet to be confirmed.
The introduction of Auto-Generated Captions is a sign of the times. Voice recognition technology, the feature’s core, seldomly saw greater development than in the past few years.
Alexa, Siri, Google Assistant and Zoom’s new feature have one pillar in common: Natural Language Processing (NLP). Established language rules, statistics, ML and deep learning models are incorporated into an application to recognize the meaning of human sounds, thus interpreting speech. Applications that succeed in accurate recognition can be prompted to follow up with an action. For example, the presentation of subtitles, as is the case with Zoom’s Auto-Generated Captions. Or the ordering of a carton of milk, in which Alexa, among others, excels.
Tools for developing NLP applications are generally available. For example, the open source Natural Language Toolkit (NLTK) contains an array of libraries and information for developing NLP applications in Python.