Microsoft Azure Transcription Engine
Overview
Microsoft Azure Opencast transcription service uses the Microsoft Azure Speech Service API to create a transcript from an audio track. The transcription is done asynchronously to speed up processing. When the result is generated, an attach-transcript workflow will be started to archive and publish the transcript. To find out more about the Microsoft Azure Speech Service API, read documentation here.
Note: You must have an active subscription to use Microsoft Azure Speech Services.
Configuration
Step 1: Get Azure subscription credentials
- Create an Azure subscription
- Create a storage account
- Get the storage account access key. Go to Azure Portal >
Storage accounts, select your storage account, chooseSecurity + networking>Access keys.Copy the key. - Create a speech resource
- Get the subscription key and region. After your Speech resource is deployed, select
Go to resourceto view and manage keys. For more information about Cognitive Services resources, see here
Step 2: Configure the Microsoft Azure Transcription Service
Edit etc/org.opencastproject.transcription.microsoft.azure.MicrosoftAzureTranscriptionService.cfg:
- Set
enabled=true - Set
azure_storage_account_nameto your storage account name - Set
azure_account_access_keyto the storage account access key - Set
azure_container_nameto a container name you want to use - Set
azure_speech_services_endpointto your speech services endpoint - Set
azure_cognitive_services_subscription_keyto the speech services subscription key - Review the other configuration options in this file and edit as needed
Step 3: Add a workflow operations or create new workflow to start transcription
Edit a workflow to start a transcription, e.g. etc/workflows/partial-publish.xml. You have to add the microsoft-azure-start-transcription operation right after the creation of the final cut of the media files. This operation may look like
<!-- This is a typical operation to generate final cut -->
<!-- of the media files. -->
<operation
id="editor"
…
</operation>
<!-- This operation will start the transcription job -->
<operation
id="microsoft-azure-start-transcription"
fail-on-error="true"
exception-handler-workflow="partial-error"
description="Start Microsoft Azure transcription job">
<configurations>
<configuration key="source-flavors">*/trimmed</configuration>
<!-- Skip this operation if flavor already exists. -->
<!-- Used for cases when mediapackage already has captions. -->
<configuration key="skip-if-flavor-exists">captions/*</configuration>
<configuration key="audio-extraction-encoding-profile">transcription-azure.audio</configuration>
</configurations>
</operation>
For more options please consult the documentation.
Step 4: Add a workflow to attach transcriptions
A sample attach transcript workflow that is preconfigured in the configuration from Step 2. Attaches the generated transcription to the mediapackage, archives and republishes it. Copy it into a new file under etc/workflows/microsoft-azure-attach-transcription.xml in your Opencast installation.
<?xml version="1.0" encoding="UTF-8"?>
<definition xmlns="http://workflow.opencastproject.org">
<id>microsoft-azure-attach-transcription</id>
<title>Attach Transcription from Microsoft Azure</title>
<description>Publish and archive transcription from Microsoft Azure Speech Services.</description>
<operations>
<operation
id="microsoft-azure-attach-transcription"
fail-on-error="true"
exception-handler-workflow="partial-error"
description="Attach transcription from Microsoft Azure">
<configurations>
<!-- This is filled out by the transcription service when starting this workflow -->
<configuration key="transcription-job-id">${transcriptionJobId}</configuration>
<!-- Set the flavor to something the Paella player will parse -->
<configuration key="target-flavor">captions/source</configuration>
<configuration key="target-tags">archive, ${transcriptionLocaleTag!}</configuration>
</configurations>
</operation>
<operation
id="snapshot"
fail-on-error="true"
exception-handler-workflow="partial-error"
description="Archive transcription">
<configurations>
<configuration key="source-tags">archive</configuration>
</configurations>
</operation>
<operation
id="tag"
description="Tagging captions for publishing">
<configurations>
<configuration key="source-flavors">captions/source</configuration>
<configuration key="target-flavor">captions/delivery</configuration>
<configuration key="target-tags">-archive</configuration>
<configuration key="copy">true</configuration>
</configurations>
</operation>
<operation
id="publish-engage"
fail-on-error="true"
exception-handler-workflow="partial-error"
description="Distribute and publish to engage server">
<configurations>
<configuration key="download-source-flavors">captions/delivery</configuration>
<configuration key="strategy">merge</configuration>
<configuration key="check-availability">false</configuration>
</configurations>
</operation>
<operation
id="cleanup"
fail-on-error="false"
description="Cleaning up">
<configurations>
<configuration key="preserve-flavors">security/*</configuration>
<configuration key="delete-external">false</configuration>
</configurations>
</operation>
</operations>
</definition>
All available options of the microsoft-azure-attach-transcription operation are documented here.
Step 5: Add audio extraction encoding profile
The audio track to transcript must be extracted from the media file and converted to a specific format for processing. This is done with encoding engine of Opencast. Put the encoding profile listed below into the file etc/encoding/custom.properties.
# Microsoft Azure Speech Services accept limited audio formats
# See https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription-audio-data#supported-audio-formats
profile.transcription-azure.audio.name = extract audio stream for transcription
profile.transcription-azure.audio.input = visual
profile.transcription-azure.audio.output = audio
profile.transcription-azure.audio.jobload = 0.5
profile.transcription-azure.audio.suffix = .ogg
profile.transcription-azure.audio.mimetype = audio/ogg
profile.transcription-azure.audio.ffmpeg.command = -i #{in.video.path} \
-vn -dn -sn -map_metadata -1 \
-c:a libopus -b:a 24k -ac 1 -ar 16k \
#{out.dir}/#{out.name}#{out.suffix}
Step 6: Enable the transcription plugin
Transcription plugins are disabled by default. Enable it in the etc/org.opencastproject.plugin.impl.PluginManagerImpl.cfg configuration file by setting the opencast-plugin-transcription-services to true.