Whisper Transcription Engine

Overview

Opencast can take advantage of Open AI's Whisper Project to generate automatic transcriptions on premise through SpeechToText WoH.

Advantages

Enable Whisper engine

To enable Whisper as for the SpeechToText WoH, follow these steps.

  1. Install whisper on the worker nodes.
  2. Enable whisper and set Job load in org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.cfg.
  3. Set the target model to use in org.opencastproject.speechtotext.impl.engine.WhisperEngine.

Additional Notes

Whisper-ctranslate2

whisper-ctranslate2 offers the same command line interface as OpenAIs whisper, so it can easily be used in lieu of it. The main benefit of whisper-ctranslate2 is its out-of-the-box processing speed increase, especially on CPUs, compared to OpenAIs whisper. Otherwise the two should behave highly similar, so the above notes still apply.

To use whisper-ctranslate2 instead of OpenAis whisper, change the whisper.root.path in org.opencastproject.speechtotext.impl.engine.WhisperEngine to your installation path.

Additional features: - Enabling quantization in org.opencastproject.speechtotext.impl.engine.WhisperEngine can increase processing speed even further. - Enabling Voice Activity Detection in org.opencastproject.speechtotext.impl.engine.WhisperEngine can prevent whisper from transcribing non-speech or silence.