WhisperC++ Transcription Engine

Overview

Opencast can take advantage of Open AI's Whisper Project to generate automatic transcriptions on premise through SpeechToText WoH.

WhisperC++ is a C/C++ implementation of OpenAI's Whisper automatic speech recognition (ASR) model.

Advantages

Enable WhisperC++ engine

To enable WhisperC++ as for the SpeechToText WoH, follow these steps.

  1. Install whispercpp binary and language models following the instruction on whisper.cpp repository or install the whisper.cpp package from the Opencast repository.
  2. Enable whispercpp engine and set job load in org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.cfg.
  3. Set the binary and target model path to use in org.opencastproject.speechtotext.impl.engine.WhisperCppEngine.
  4. WhisperC++ processes only PCM16 (wav) audio files. Therefore you probably have to add an encode-operation before running speechtotext in your workflow and an encoder profile:
  - id: encode
    description: "Extract audio for processing with whispercpp"
    configurations:
      - source-flavor: "*/source"
      - target-flavor: "*/audio+stt"
      - encoding-profile: audio-whispercpp

  - id: speechtotext
    description: "Generates subtitles with whispercpp"
    configurations:
      - source-flavor: "*/audio+stt"
      - target-flavor: "captions/delivery"
      - target-element: track
      - target-tags: >-
          archive,
          engage-download
# Audio-only encoding format used for whispercpp
profile.audio-whispercpp.name = whispercpp wav
profile.audio-whispercpp.input = stream
profile.audio-whispercpp.output = stream
profile.audio-whispercpp.suffix = -stt.wav
profile.audio-whispercpp.ffmpeg.command = -i #{in.video.path} -vn -ar 16000 -ac 1 -c:a pcm_s16le #{out.dir}/#{out.name}#{out.suffix}
profile.audio-whispercpp.jobload = 1.0

Additional Notes