Speech to Text Workflow Operation
ID: speechtotext
Description
The speech to text operation can be used to generate subtitles from video or audio files. Currently, there are three STT engines available, Whisper, WhisperC++ and Vosk. The subtitles file format ist WebVTT.
Parameter Table
Configuration Keys | Required | Example | Description |
---|---|---|---|
source-flavor | yes | presenter/source | The source media package to use |
target-flavor | yes | archive | Flavor of the produced subtitle |
target-tags | no | captions/source | Tags applies to the resulting subtitle element²³. |
target-element | no | track | Define where to append the subtitles file. Possibilities are: as a 'track' or as an 'attachment' (default: track ). |
language-code | no | de | The language of the video or audio source⁴. |
language-fallback | no¹ | en | Optional fallback value if the dublin core/media package language field is not set. |
translate | no | true | Transcription is translated into English, valid values true or false (Whisper/WhisperC++ only) |
limit-to-one | no | true | Limits the maximum of generated subtitles to one. |
track-selection-strategy | no | everything | Define what tracks shall be selected for subtitle generation if used together with limit-to-one (default: everything ). |
async | no | false | Start transcription in the background. Use speechtotext-attach to get the finished transcriptions later in the workflow (default: false ). |
- Vosk default value can be modified on Vosk config file.
- For conventionally used tags see the general page on Subtitles. The
generator
andgenerator-type
tags will be set automatically. For Whisper, iIf nolanguage-code
is set, thelang
tag will be auto-generated. - This has no effect if the transcription is run asynchronously. It only applies to attached in this operation.
- Vosk only: It has to match the name of the language model directory. See 'vosk-cli'.
Requirements
In order for it to work, you have to install the vosk-cli, whisper or whispercpp package.
Track Selection Strategy
Use the tack selection strategy in combination with the limit-to-one
option to define what tracks are selected for
subtitle generation.
For example, if you set source-flavor
to */source
because your are unsure which track includes the audio, you can
use the track-selection-strategy
to have Opencast prefer the presenter track for transcriptions.
Available options are:
presenter_or_nothing
: only uses presenter tracks.presentation_or_nothing
: only uses presentation tracks.try_presenter_first
: look for presenter tracks first, if there are no usable, try to transcribe the other tracks.try_presentation_first
: look for presentation tracks first, falling back to other tracks if none are usable.everything
: just transcribe everything (this is the default).
Operation Examples
- id: speechtotext
description: Generates subtitles for video and audio files
configurations:
- source-flavor: '*/source'
- target-flavor: captions/source
- target-tags: engage-download
- limit-to-one: true