Whisper Transcription Engine
- Transcription on more than 80 languages
- Translation to English
- Automatic language detection
- Fast processing (When using a GPU)
- Run locally, no data sent to third parties.
Enable Whisper engine
To enable Whisper as for the
SpeechToText WoH, follow these steps.
- Install whisper on the worker nodes.
- Or install whisper-ctranslate2 for faster processing on CPU.
- Enable whisper and set Job load in
- Set the target model to use in
- Whisper can be run on CPU or GPU, the use of a GPU increase the performance dramatically.
- There are five languages models available to use, from the lightest (tiny) to the most complete (large), having a bigger model improves the accuracy but diminishes processing speed.
- It's recommended to set a Job load for each machine.
- In the case that one want to use only one worker node with Whisper, one can set the job load to be bigger than the size on the non Whisper nodes. The whisper Job will only be run on the Whisper machines (Whose nodes have higher job load set).
- Also, is a good idea on the Whisper node to configure it.
- Avoid workflows failures over not enough memory with parallel transcriptions.
- Performance bottleneck with too many parallel transcriptions.
whisper-ctranslate2 offers the same command line interface as OpenAIs whisper, so it can easily be used in lieu of it. The main benefit of whisper-ctranslate2 is its out-of-the-box processing speed increase, especially on CPUs, compared to OpenAIs whisper. Otherwise the two should behave highly similar, so the above notes still apply.
To use whisper-ctranslate2 instead of OpenAis whisper, change the
org.opencastproject.speechtotext.impl.engine.WhisperEngine to your installation path.
- Enabling quantization in
org.opencastproject.speechtotext.impl.engine.WhisperEngine can increase processing
speed even further.
- Enabling Voice Activity Detection in
org.opencastproject.speechtotext.impl.engine.WhisperEngine can prevent
whisper from transcribing non-speech or silence.