Google creates the voice engine used for Live Transcribe open source. The company hopes that this will enable developers to transcribe long conversations. The source code can be found on GitHub.
The engine is currently already used for Live Transcribe on Android. This tool uses algorithms and machine learning to convert audio into text in real-time. However, the app has a number of limitations. This way, no infinite streams can be supported and the tool works from the cloud. Google hopes to give developers the opportunity to create these kinds of features themselves.
Features
According to Google, the libraries made available are almost identical to those in the Live Transcribe app. The technology giant also reports that they were extensively tested, but that those tests were not open source. However, APKs are offered so that a library can be tried out without having to encode. See below for an overview of features of the open-source engine:
- Unlimited streaming.
- Support for more than 70 languages.
- Support in case of short network connection loss (when traveling or switching between network and wifi). Text is not lost, but may appear slower. The engine is also prepared for prolonged interruptions. A new connection will be established even if the network has been idle for hours (of course, no voice recognition can be provided without a connection).
- Support in case of server errors.
- Opus, AMR-WB and FLAC encryption can be easily enabled and configured.
- Contains a library for text formatting and visualization of ASR-confidence, speaker-ID, etc.
- Expandable to offline models.
- Built-in support for voice detectors, to stop ASR during prolonged silences, to save money and data.
- Built-in speaker identification support. This is used to label or mark text based on the speaker number.
