Whisper Transcription Engine

Overview

Opencast can take advantage of Open AI's Whisper Project to generate automatic transcriptions on premise through SpeechToText WoH.

Advantages

Transcription on more than 80 languages
Translation to English
Automatic language detection
Fast processing (When using a GPU)
Run locally, no data sent to third parties.

Enable Whisper engine

To enable Whisper as for the SpeechToText WoH, follow these steps.

Install whisper on the worker nodes.
- Or install whisper-ctranslate2 for faster processing on CPU.
Enable whisper and set Job load in org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.cfg.
Set the target model to use in org.opencastproject.speechtotext.impl.engine.WhisperEngine.

Additional Notes

Whisper can be run on CPU or GPU, the use of a GPU increase the performance dramatically.
There are five languages models available to use, from the lightest (tiny) to the most complete (large), having a bigger model improves the accuracy but diminishes processing speed.
It's recommended to set a Job load for each machine.
In the case that one want to use only one worker node with Whisper, one can set the job load to be bigger than the size on the non Whisper nodes. The whisper Job will only be run on the Whisper machines (Whose nodes have higher job load set).
Also, is a good idea on the Whisper node to configure it.
- Avoid workflows failures over not enough memory with parallel transcriptions.
- Performance bottleneck with too many parallel transcriptions.

Whisper-ctranslate2

whisper-ctranslate2 offers the same command line interface as OpenAIs whisper, so it can easily be used in lieu of it. The main benefit of whisper-ctranslate2 is its out-of-the-box processing speed increase, especially on CPUs, compared to OpenAIs whisper. Otherwise the two should behave highly similar, so the above notes still apply.

To use whisper-ctranslate2 instead of OpenAis whisper, change the whisper.root.path in org.opencastproject.speechtotext.impl.engine.WhisperEngine to your installation path.

Additional features: - Enabling quantization in org.opencastproject.speechtotext.impl.engine.WhisperEngine can increase processing speed even further. - Enabling Voice Activity Detection in org.opencastproject.speechtotext.impl.engine.WhisperEngine can prevent whisper from transcribing non-speech or silence.