Microsoft Azure Transcription Engine
Overview
Microsoft Azure Opencast transcription service uses the Microsoft Azure Speech Service API to create a transcript from an audio track. The transcription is done asynchronously to speed up processing. When the result is generated, an attach-transcript workflow will be started to archive and publish the transcript. To find out more about the Microsoft Azure Speech Service API, read documentation here.
Note: You must have an active subscription to use Microsoft Azure Speech Services.
Configuration
Step 1: Get Azure subscription credentials
- Create an Azure subscription
- Create a storage account
- Get the storage account access key. Go to Azure Portal >
Storage accounts
, select your storage account, chooseSecurity + networking
>Access keys.
Copy the key. - Create a speech resource
- Get the subscription key and region. After your Speech resource is deployed, select
Go to resource
to view and manage keys. For more information about Cognitive Services resources, see here
Step 2: Configure the Microsoft Azure Transcription Service
Edit etc/org.opencastproject.transcription.microsoft.azure.MicrosoftAzureTranscriptionService.cfg
:
- Set
enabled
=true
- Set
azure_storage_account_name
to your storage account name - Set
azure_account_access_key
to the storage account access key - Set
azure_container_name
to a container name you want to use - Set
azure_speech_services_endpoint
to your speech services endpoint - Set
azure_cognitive_services_subscription_key
to the speech services subscription key - Review the other configuration options in this file and edit as needed
Step 3: Add a workflow operations or create new workflow to start transcription
Edit a workflow to start a transcription, e.g. etc/workflows/partial-publish.xml
. You have to add the microsoft-azure-start-transcription
operation right after the creation of the final cut of the media files. This operation may look like
<!-- This is a typical operation to generate final cut -->
<!-- of the media files. -->
<operation
id="editor"
…
</operation>
<!-- This operation will start the transcription job -->
<operation
id="microsoft-azure-start-transcription"
fail-on-error="true"
exception-handler-workflow="partial-error"
description="Start Microsoft Azure transcription job">
<configurations>
<configuration key="source-flavors">*/trimmed</configuration>
<!-- Skip this operation if flavor already exists. -->
<!-- Used for cases when mediapackage already has captions. -->
<configuration key="skip-if-flavor-exists">captions/*</configuration>
<configuration key="audio-extraction-encoding-profile">transcription-azure.audio</configuration>
</configurations>
</operation>
For more options please consult the documentation.
Step 4: Add a workflow to attach transcriptions
A sample attach transcript workflow that is preconfigured in the configuration from Step 2. Attaches the generated transcription to the mediapackage, archives and republishes it. Copy it into a new file under etc/workflows/microsoft-azure-attach-transcription.xml
in your Opencast installation.
<?xml version="1.0" encoding="UTF-8"?>
<definition xmlns="http://workflow.opencastproject.org">
<id>microsoft-azure-attach-transcription</id>
<title>Attach Transcription from Microsoft Azure</title>
<description>Publish and archive transcription from Microsoft Azure Speech Services.</description>
<operations>
<operation
id="microsoft-azure-attach-transcription"
fail-on-error="true"
exception-handler-workflow="partial-error"
description="Attach transcription from Microsoft Azure">
<configurations>
<!-- This is filled out by the transcription service when starting this workflow -->
<configuration key="transcription-job-id">${transcriptionJobId}</configuration>
<!-- Set the flavor to something the Paella player will parse -->
<configuration key="target-flavor">captions/source</configuration>
<configuration key="target-tags">archive, ${transcriptionLocaleTag!}</configuration>
</configurations>
</operation>
<operation
id="snapshot"
fail-on-error="true"
exception-handler-workflow="partial-error"
description="Archive transcription">
<configurations>
<configuration key="source-tags">archive</configuration>
</configurations>
</operation>
<operation
id="tag"
description="Tagging captions for publishing">
<configurations>
<configuration key="source-flavors">captions/source</configuration>
<configuration key="target-flavor">captions/delivery</configuration>
<configuration key="target-tags">-archive</configuration>
<configuration key="copy">true</configuration>
</configurations>
</operation>
<operation
id="publish-engage"
fail-on-error="true"
exception-handler-workflow="partial-error"
description="Distribute and publish to engage server">
<configurations>
<configuration key="download-source-flavors">captions/delivery</configuration>
<configuration key="strategy">merge</configuration>
<configuration key="check-availability">false</configuration>
</configurations>
</operation>
<operation
id="cleanup"
fail-on-error="false"
description="Cleaning up">
<configurations>
<configuration key="preserve-flavors">security/*</configuration>
<configuration key="delete-external">false</configuration>
</configurations>
</operation>
</operations>
</definition>
All available options of the microsoft-azure-attach-transcription
operation are documented here.
Step 5: Add audio extraction encoding profile
The audio track to transcript must be extracted from the media file and converted to a specific format for processing. This is done with encoding engine of Opencast. Put the encoding profile listed below into the file etc/encoding/custom.properties
.
# Microsoft Azure Speech Services accept limited audio formats
# See https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription-audio-data#supported-audio-formats
profile.transcription-azure.audio.name = extract audio stream for transcription
profile.transcription-azure.audio.input = visual
profile.transcription-azure.audio.output = audio
profile.transcription-azure.audio.jobload = 0.5
profile.transcription-azure.audio.suffix = .ogg
profile.transcription-azure.audio.mimetype = audio/ogg
profile.transcription-azure.audio.ffmpeg.command = -i #{in.video.path} \
-vn -dn -sn -map_metadata -1 \
-c:a libopus -b:a 24k -ac 1 -ar 16k \
#{out.dir}/#{out.name}#{out.suffix}
Step 6: Enable the transcription plugin
Transcription plugins are disabled by default. Enable it in the etc/org.opencastproject.plugin.impl.PluginManagerImpl.cfg
configuration file by setting the opencast-plugin-transcription-services
to true
.