Transcripts (Automated by IBM Watson)

Overview

The IBMWatsonTranscriptionService invokes the IBM Watson Speech-to-Text service via REST API to translate audio to text.

During the execution of an Opencast workflow, an audio file is extracted from one of the presenter videos and sent to the IBM Watson Speech-to-Text service. When the results are received, they are converted to the desired caption format and attached to the media package.

Workflow 1 runs:

Translation finishes, callback with results is received, and workflow 2 is started.

Workflow 2 runs:

IBM Watson Speech-to-Text service documentation, including which languages are currently supported, can be found here.

Configuration

Step 1: Get IBM Watson credentials

As of 10/30/2018, the service has migrated to token-based Identity and Access Management (IAM) authentication so user and password are not generated anymore. Previously created instances can still use user name and password. Details can be found here.

Step 2: Configure IBMWatsonTranscriptionService

Edit etc/org.opencastproject.transcription.ibmwatson.IBMWatsonTranscriptionService.cfg:

# Change enabled to true to enable this service.
enabled=false

# IBM Watson Speech-to-Text service url
# Default: https://stream.watsonplatform.net/speech-to-text/api
# ibm.watson.service.url=https://stream.watsonplatform.net/speech-to-text/api

# APi key obtained when registering with the IBM Watson Speech-to_text service.
# If empty, user and password below will be used.
ibm.watson.api.key=<API_KEY>

# User obtained when registering with the IBM Watson Speech-to_text service.
# Mandatory if ibm.watson.api.key not entered.
#ibm.watson.user=<SERVICE_USER>

# Password obtained when registering with the IBM Watson Speech-to_text service
# Mandatory if ibm.watson.api.key not entered.
#ibm.watson.password=<SERVICE_PSW>

# Language model to be used. See the IBM Watson Speech-to-Text service documentation
# for available models.
# Default: en-US_BroadbandModel
#ibm.watson.model=en-US_BroadbandModel

# Workflow to be executed when results are ready to be attached to media package.
# Default: attach-watson-transcripts
#workflow=attach-watson-transcripts

# Interval the workflow dispatcher runs to start workflows to attach transcripts to the media package
# after the transcription job is completed. In seconds.
# Default: 60
#workflow.dispatch.interval=60

# How long it should wait to check jobs after their start date + track duration has passed.
# This is only used if we didn't get a callback from the ibm watson speech-to-text service.
# In seconds.
# Default: 600
#completion.check.buffer=600

# How long to wait after a transcription is supposed to finish before marking the job as
# canceled in the database. In seconds. Default is 2 hours.
# Default: 7200
#max.processing.time=7200

# How long to keep result files in the working file repository in days.
# Default: 7
#cleanup.results.days=7

# Email to send notifications of errors. If not entered, the value from
# org.opencastproject.admin.email in custom.properties will be used.
#notification.email=

# Start transcription job load
# Default: 0.1
#job.load.start.transcription=0.1

# Number of max attempts. If max attempts > 1 and the service returned an error after the recognitions job was
# accepted or the job did not return any results, the transcription is re-submitted. Default is to not retry.
# Default: 1
#max.attempts=

# If max.attempts > 1, name of workflow to use for retries.
#retry.workflow=

Step 3: Add encoding profile for extracting audio

The IBM Watson Speech-to-Text service has limitations on audio file size. Try using the encoding profile suggested in etc/encoding/watson-audio.properties.

Step 4: Add workflow operations and create new workflow

Add the following operations to your workflow. We suggest adding them after the media package is published so that users can watch videos without having to wait for the transcription to finish, but it depends on your use case. The only requirement is to take a snapshot of the media package so that the second workflow can retrieve it from the Asset Manager to attach the caption/transcripts.

<!-- Extract audio from one of the presenter videos -->

<operation
  id="encode"
  fail-on-error="true"
  exception-handler-workflow="partial-error"
  description="Extract audio for transcript generation">
  <configurations>
    <configuration key="source-tags">engage-download</configuration>
    <configuration key="target-flavor">audio/ogg</configuration>
    <!-- The target tag 'transcript' will be used in the next 'start-watson-transcription' operation -->
    <configuration key="target-tags">transcript</configuration>
    <configuration key="encoding-profile">audio-opus</configuration>
    <!-- If there is more than one file that match the source-tags, use only the first one -->
    <configuration key="process-first-match-only">true</configuration>
  </configurations>
</operation>

<!-- Start IBM Watson recognitions job -->

<operation
  id="start-watson-transcription"
  fail-on-error="true"
  exception-handler-workflow="partial-error"
  description="Start IBM Watson transcription job">
  <configurations>
    <!--  Skip this operation if flavor already exists. Used for cases when mp already has captions. -->
    <configuration key="skip-if-flavor-exists">captions/vtt+en</configuration>
    <!-- Audio to be translated, produced in the previous compose operation -->
    <configuration key="source-tag">transcript</configuration>
  </configurations>
</operation>

Create a workflow that will add the generated caption/transcript to the media package and republish it. A sample one can be found in etc/workflows/attach-watson-transcripts.xml

If re-submitting requests is desired in case of failures, create a workflow that will start a transcription job. A sample one can be found in etc/workflows/retry-watson-transcripts.xml

Workflow Operations