Microsoft Azure Transcription Engine

Overview

Microsoft Azure Opencast transcription service uses the Microsoft Azure Speech Service API to create a transcript from an audio track. The transcription is done asynchronously to speed up processing. When the result is generated, an attach-transcript workflow will be started to archive and publish the transcript. To find out more about the Microsoft Azure Speech Service API, read documentation here.

Note: You must have an active subscription to use Microsoft Azure Speech Services.

Configuration

Step 1: Get Azure subscription credentials

Step 2: Configure the Microsoft Azure Transcription Service

Edit etc/org.opencastproject.transcription.microsoft.azure.MicrosoftAzureTranscriptionService.cfg:

Step 3: Add a workflow operations or create new workflow to start transcription

Edit a workflow to start a transcription, e.g. etc/workflows/partial-publish.xml. You have to add the microsoft-azure-start-transcription operation right after the creation of the final cut of the media files. This operation may look like

<!-- This is a typical operation to generate final cut -->
<!-- of the media files. -->
<operation
  id="editor"
  …
</operation>

<!-- This operation will start the transcription job -->
<operation
  id="microsoft-azure-start-transcription"
  fail-on-error="true"
  exception-handler-workflow="partial-error"
  description="Start Microsoft Azure transcription job">
  <configurations>
    <configuration key="source-flavors">*/trimmed</configuration>
    <!-- Skip this operation if flavor already exists. -->
    <!-- Used for cases when mediapackage already has captions. -->
    <configuration key="skip-if-flavor-exists">captions/*</configuration>
    <configuration key="audio-extraction-encoding-profile">transcription-azure.audio</configuration>
  </configurations>
</operation>

For more options please consult the documentation.

Step 4: Add a workflow to attach transcriptions

A sample attach transcript workflow that is preconfigured in the configuration from Step 2. Attaches the generated transcription to the mediapackage, archives and republishes it. Copy it into a new file under etc/workflows/microsoft-azure-attach-transcription.xml in your Opencast installation.

<?xml version="1.0" encoding="UTF-8"?>
<definition xmlns="http://workflow.opencastproject.org">
  <id>microsoft-azure-attach-transcription</id>
  <title>Attach Transcription from Microsoft Azure</title>
  <description>Publish and archive transcription from Microsoft Azure Speech Services.</description>
  <operations>

    <operation
      id="microsoft-azure-attach-transcription"
      fail-on-error="true"
      exception-handler-workflow="partial-error"
      description="Attach transcription from Microsoft Azure">
      <configurations>
        <!-- This is filled out by the transcription service when starting this workflow -->
        <configuration key="transcription-job-id">${transcriptionJobId}</configuration>
        <!-- Set the flavor to something the Paella player will parse -->
        <configuration key="target-flavor">captions/source</configuration>
        <configuration key="target-tags">archive, ${transcriptionLocaleTag!}</configuration>
      </configurations>
    </operation>

    <operation
      id="snapshot"
      fail-on-error="true"
      exception-handler-workflow="partial-error"
      description="Archive transcription">
      <configurations>
        <configuration key="source-tags">archive</configuration>
      </configurations>
    </operation>

    <operation
      id="tag"
      description="Tagging captions for publishing">
      <configurations>
        <configuration key="source-flavors">captions/source</configuration>
        <configuration key="target-flavor">captions/delivery</configuration>
        <configuration key="target-tags">-archive</configuration>
        <configuration key="copy">true</configuration>
      </configurations>
    </operation>

    <operation
      id="publish-engage"
      fail-on-error="true"
      exception-handler-workflow="partial-error"
      description="Distribute and publish to engage server">
      <configurations>
        <configuration key="download-source-flavors">captions/delivery</configuration>
        <configuration key="strategy">merge</configuration>
        <configuration key="check-availability">false</configuration>
      </configurations>
    </operation>

    <operation
      id="cleanup"
      fail-on-error="false"
      description="Cleaning up">
      <configurations>
        <configuration key="preserve-flavors">security/*</configuration>
        <configuration key="delete-external">false</configuration>
      </configurations>
    </operation>
  </operations>
</definition>

All available options of the microsoft-azure-attach-transcription operation are documented here.

Step 5: Add audio extraction encoding profile

The audio track to transcript must be extracted from the media file and converted to a specific format for processing. This is done with encoding engine of Opencast. Put the encoding profile listed below into the file etc/encoding/custom.properties.

# Microsoft Azure Speech Services accept limited audio formats
# See https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription-audio-data#supported-audio-formats
profile.transcription-azure.audio.name = extract audio stream for transcription
profile.transcription-azure.audio.input = visual
profile.transcription-azure.audio.output = audio
profile.transcription-azure.audio.jobload = 0.5
profile.transcription-azure.audio.suffix = .ogg
profile.transcription-azure.audio.mimetype = audio/ogg
profile.transcription-azure.audio.ffmpeg.command = -i #{in.video.path} \
    -vn -dn -sn -map_metadata -1 \
    -c:a libopus -b:a 24k -ac 1 -ar 16k \
    #{out.dir}/#{out.name}#{out.suffix}

Step 6: Enable the transcription plugin

Transcription plugins are disabled by default. Enable it in the etc/org.opencastproject.plugin.impl.PluginManagerImpl.cfg configuration file by setting the opencast-plugin-transcription-services to true.