Transcripts (Automated by IBM Watson)


The IBMWatsonTranscriptionService invokes the IBM Watson Speech-to-Text service via REST API to translate audio to text.

During the execution of an Opencast workflow, an audio file is extracted from one of the presenter videos and sent to the IBM Watson Speech-to-Text service. When the results are received, they are converted to the desired caption format and attached to the media package.

Workflow 1 runs:

Translation finishes, callback with results is received, and workflow 2 is started.

Workflow 2 runs:

IBM Watson Speech-to-Text service documentation, including which languages are currently supported, can be found here.


Step 1: Get IBM Watson credentials

As of 10/30/2018, the service has migrated to token-based Identity and Access Management (IAM) authentication so user and password are not generated anymore. Details can be found here.

As a temporary workaround, when configuring the transcription service, enter the constant "apikey" as the user name.



Step 2: Configure IBMWatsonTranscriptionService

Edit etc/org.opencastproject.transcription.ibmwatson.IBMWatsonTranscriptionService.cfg:

# Change enabled to true to enable this service.

# User obtained when registering with the IBM Watson Speech-to_text service

# Password obtained when registering with the IBM Watson Speech-to_text service

# Language model to be used. See the IBM Watson Speech-to-Text service documentation
# for available models. If empty, the default will be used ("en-US_BroadbandModel").

# Workflow to be executed when results are ready to be attached to media package.

# Interval the workflow dispatcher runs to start workflows to attach transcripts to the media package
# after the transcription job is completed.
# (in seconds) Default is 1 minute.

# How long it should wait to check jobs after their start date + track duration has passed.
# The default is 10 minutes. This is only used if we didn't get a callback from the
# ibm watson speech-to-text service.
# (in seconds)

# How long to wait after a transcription is supposed to finish before marking the job as
# canceled in the database. Default is 2 hours.
# (in seconds)

# How long to keep result files in the working file repository in days.
# The default is 7 days.

# Email to send notifications of errors. If not entered, the value from
# in will be used.

Step 3: Add encoding profile for extracting audio

The IBM Watson Speech-to-Text service has limitations on audio file size. Try using the encoding profile suggested in etc/encoding/

Step 4: Add workflow operations and create new workflow

Add the following operations to your workflow. We suggest adding them after the media package is published so that users can watch videos without having to wait for the transcription to finish, but it depends on your use case. The only requirement is to take a snapshot of the media package so that the second workflow can retrieve it from the Asset Manager to attach the caption/transcripts.

<!-- Extract audio from one of the presenter videos -->

  description="Extract audio for transcript generation">
    <configuration key="source-tags">engage-download</configuration>
    <configuration key="target-flavor">audio/ogg</configuration>
    <!-- The target tag 'transcript' will be used in the next 'start-watson-transcription' operation -->
    <configuration key="target-tags">transcript</configuration>
    <configuration key="encoding-profile">audio-opus</configuration>
    <!-- If there is more than one file that match the source-tags, use only the first one -->
    <configuration key="process-first-match-only">true</configuration>

<!-- Start IBM Watson recognitions job -->

  description="Start IBM Watson transcription job">
    <!--  Skip this operation if flavor already exists. Used for cases when mp already has captions. -->
    <configuration key="skip-if-flavor-exists">captions/vtt+en</configuration>
    <!-- Audio to be translated, produced in the previous compose operation -->
    <configuration key="source-tag">transcript</configuration>

Create a workflow that will add the generated caption/transcript to the media package and republish it. A sample one can be found in etc/workflows/attach-watson-transcripts.xml

Workflow Operations