Start Watson Transcription


The Start Watson Transcription invokes the IBM Watson Speech-to-Text service, passing an audio file to be translated to text.

Parameter Table

configuration keys description default value example
source-flavor The flavor of the audio file to be sent for translation. EMPTY presenter/delivery
source-tag The flavor of the audio file to be sent for translation. EMPTY transcript-audio
skip-if-flavor-exists If this flavor already exists in the media package, skip this operation.
To be used when the media package already has a transcript file.
false captions/vtt+en

One of source-flavor or source-tag must be specified.


<!-- Extract audio from video in ogg/opus format -->

  description="Extract audio for transcript generation">
    <configuration key="source-tags">engage-download</configuration>
    <configuration key="target-flavor">audio/ogg</configuration>
    <configuration key="target-tags">transcript</configuration>
    <configuration key="encoding-profile">audio-opus</configuration>
    <!-- If there is more than one file that match the source-tags, use only the first one -->
    <configuration key="process-first-match-only">true</configuration>

<!-- Start IBM Watson recognitions job -->

  description="Start IBM Watson transcription job">
    <!--  Skip this operation if flavor already exists. Used for cases when mp already has captions. -->
    <configuration key="skip-if-flavor-exists">captions/vtt+en</configuration>
    <!-- Audio to be translated, produced in the previous compose operation -->
    <configuration key="source-tag">transcript</configuration>

Encoding profile used in example above = audio-opus = stream = audio = -audio.opus = audio/ogg = -i /#{} -c:a libvorbis -ac 1 -ar 16k -b:a 64k #{out.dir}/#{}#{out.suffix}