Partial Import Workflow Operation

ID: partial-import

Description

The PartialImportWorkflowOperation processes a set of audio and video files according to a SMIL document describing their relations. Its primary use is to post-process audio and video files ingested by capture agents using /ingest/addPartialTrack of the ingest endpoint.

Prerequisite

When using the partial-import operation, make sure to perform a media inspection beforehand using the inspect operation with the option accurate-frame-count set to true. This ensures that the partial-import operation works correctly in case of media files with incorrect frame count in their header. Note that the use of accurate-frame-count will force the InspectWorkflowOperation to decode the complete video stream which makes the operation more expensive in terms of load.

Not all encoding profiles required for the PartialImportWorkflowOperation are available in Opencast per default, so you will have to add them yourself. Examples can be found further down.

Parameter Table

configuration keys	type	description	default value
source-presenter-flavor	MediaPackageElementFlavor	The flavor of tracks for the presenter video
source-presentation-flavor	MediaPackageElementFlavor	The flavor of tracks for the presentation video
source-smil-flavor*	MediaPackageElementFlavor	The flavor of the SMIL file describing how to build the targets. When using /ingest/addPartialTrack, the ingest service will create the SMIL file and add it to the media package as flavor smil/source+partial
target-presenter-flavor*	MediaPackageElementFlavor	The flavor to be used for the target presentation track. Both the type and subtype must not be *
target-presentation-flavor*	MediaPackageElementFlavor	The flavor to be used for the target presentation track. Both the type nor subtype must not be *
preencode-encoding-profile*	String	Encoding profile to for pre-processing videos. Must ensure same resolution, codec, framerate and samplerate. Must use the same framerate as the image-movie.work profile
concat-encoding-profile*	String	Encoding profile used for concatenating audio or video files
concat-output-framerate	Float	The optional output framerate for concatenated video files
trim-encoding-profile*	String	Encoding profile using for trimming tracks
force-encoding	Boolean	If set to true, all generated target files will be encoded using the encoding profile force-encoding-profile	false
force-encoding-profile*	String	Encoding profile to be used when force-encoding is set to true or a given target track has a file extension not included in required-extensions
required-extensions	String , { "," , String }	Comma-separated list of file extension names (case insensitive). All generated target files whose file extensions are not in this list will be encoded using the encoding profile force-encoding-profile	"mp4"
enforce-divisible-by-two	Boolean	If set, all video targets will have widths and heights divisible by two. This might be necessary depending since some encoder fail when encountering uneven widths or heights.	false

* required keys

Note that it is allowed to set the configuration keys 'target-presenter-flavor' and 'target-presentation-flavor' to the same value.

Operation Example

What exactly the PartialImportWorkflowOperation does is best described by example. In our example, a capture agent records three sources:

Presenter camera (video only)
Presenter microphone (audio only)
Presentation (video only)

While the capture agent internally triggers the recording for all sources at the same time, the actual recording of the individual sources might not necessarily start at the exact same time, e.g. due to latency of the recording devices.

Also, while recording, a watch dog in our example capture agent recognizes that for whatever reason, the recording of the sources had stopped and restarted again several times - resulting in multiple audio and/or video files per source.

Here is a graphics showing how this example could look like:

So we have three tracks, but seven files:

Presenter camera: 2 video-only files
Presenter microphone: 2 audio-only files
Presentation: 3 video-only files

We call that individual fragments of a track partial tracks.

Our example capture agent can now use the addPartialTrack ingest facility to specify for each of the ingested files, at which time the content fits into the overall recording. The ingest service will automatically create the SMIL file describing how the files relate to each other and add it to the media package as flavor smil/source+partial.

In our example, this SMIL file would like something like:

<?xml version="1.1" encoding="UTF-8"?>
<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0">
  <head/>
  <body>
    <par dur="93861ms">
      <seq>
        <video begin="412ms" dur="13440ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/5133d85c-5813-4b54-8a43-0cce9ddc1c4a/video_file.mov"/>
        <video begin="15324ms" dur="73440ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/5133d85c-5813-4b54-8a43-0cce9ddc1c4a/video_file.mov"/>
        <audio begin="0ms" dur="40861ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/72a42596-e1d0-47a5-b9c8-60180b466954/audio_file.mov"/>
        <audio begin="43400ms" dur="13861ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/72a42596-e1d0-47a5-b9c8-60180b466954/audio_file.mov"/>
      </seq>
      <seq>
        <video begin="948ms" dur="33440ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/bf5ea647-b99b-4ec3-a10c-29445fb01eca/video_file.mov"/>
        <video begin="35643ms" dur="15430ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/bf5ea647-b99b-4ec3-a10c-29445fb01eca/video_file.mov"/>
        <video begin="45448ms" dur="25440ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/bf5ea647-b99b-4ec3-a10c-29445fb01eca/video_file.mov"/>
      </seq>
    </par>
  </body>
</smil>

What we finally want, however, is a single presenter and a single presentation track that can be processed by Opencast workflow operations. To achieve this, the PartialImportWorkflowOperation is used to post-process the files as described in the SMIL file:

<operation id="partial-import"
     description="Post-processing raw audio and video files from capture agent"
     fail-on-error="true"
     exception-handler-workflow="partial-error">
  <configurations>
    <configuration key="source-presenter-flavor">presenter/source</configuration>
    <configuration key="source-presentation-flavor">presentation/source</configuration>
    <configuration key="source-smil-flavor">smil/source+partial</configuration>
    <configuration key="target-presenter-flavor">presenter/standard</configuration>
    <configuration key="target-presentation-flavor">presentation/standard</configuration>
    <configuration key="preencode-encoding-profile">partial-import-preencode</configuration>
    <configuration key="concat-encoding-profile">concat.work</configuration>
    <configuration key="trim-encoding-profile">trim.work</configuration>
    <configuration key="force-encoding-profile">encode.partial-import</configuration>
  </configurations>
</operation>

In our example, the PartialImportWorkflowOperation will create the target flavors presenter/standard and presentation/standard as depicted below:

The green parts have been filled in by the PartialImportWorkflowOperation by either silence (audio) or pictures (video).

To achieve this, the PartialImportWorkflowOperation performs the following steps:

Extend content at the beginning: If the first partial track of a given source and type (audio/video) does not begin at position zero, content is added in front of it so that the corresponding target track will start at position zero. For audio, silence is added. In case of video, the first frame of the first partial track is added.
Filling the gaps: As you can see in our example, it is possible that content is missing within the actual tracks. Those gaps are filled by adding silence (in case of audio) or adding the last frame of the previous partial track (in case of video). In this step, content is also added at the end of the track in case its duration is less than the overall duration of the recording.
Trim the tracks: It is possible that processing the ingested files according to the SMIL file would result in tracks that have a longer duration than the overall recording should. Therefore, all tracks are trimmed individually to the target duration.
Mux audio tracks: To avoid the necessity to call further workflow operations just for audio muxing, the PartialImportWorkflowOperation can perform audio muxing itself. In our example, it would mux the audio and video track of the presenter into a single audio/video track.
Ensure specific output formats: There may be situations where you want to ensure that the output of this operations comes with a specific file format, e.g. MP4. The configuration keys force-encoding and required_extensions can be used to control the behavior of the PartialImportWorkflowOperation: In case the force-encoding is set to true, the target tracks will be re-encoded using the force-encoding-profile. The target tracks will also be re-encoded using that encoding profile in case its file extensions don't match the required_extensions.

SMIL File Structure

The PartialImportWorkflowOperation expects a specific subset of SMIL that is described in this section. The overall structure of the SMIL file is shown by example below:

<?xml version="1.1" encoding="UTF-8"?>
<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0">
  <head/>
  <body>
    <par dur="15000ms">
      <seq>
        <video begin="400ms" dur="13000ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/5133d85c-5813-4b54-8a43-0cce9ddc1c4a/video_file.mov"/>
        <video begin="15000ms" dur="70000ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/5133d85c-5813-4b54-8a43-0cce9ddc1c4a/video_file.mov"/>
        <audio begin="0ms" dur="400ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/72a42596-e1d0-47a5-b9c8-60180b466954/audio_file.mov"/>
        <audio begin="900ms" dur="13000ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/72a42596-e1d0-47a5-b9c8-60180b466954/audio_file.mov"/>
      </seq>
      <seq>
        <video begin="900ms" dur="30000ms" src="/files/mediapackage/7b56bf47-8065-4244-96a0-412a759ccc3f/bf5ea647-b99b-4ec3-a10c-29445fb01eca/video_file.mov"/>
      </seq>
    </par>
  </body>
</smil>

The PartialImportWorkflowOperation can handle at most one par element that is used to describe to overall media duration using the attribute dur. The resulting tracks will be trimmed to this duration if necessary. In the example above, the overall media duration is set to 15 seconds.

The par element has one or two sequence sub elements seq, each describing a track that is to be built from its sub elements - the partial tracks. Each sequence (seq) has at least one sub element. Sub elements can be either video elements, audio elements or both video and audio elements. Each of those sub elements requires the attributes begin (position of partial track in milliseconds relative to start of overall media) and dur (duration of partial track in milliseconds) The audio elements are used to indicate that the media file referred to is an audio-only media file, whereas video elements can refer to either video-only or audio-video media files. The following combinations result in a defined behavior:

Supported Combinations of Video and Audio Elements

video	audio	resulting track
audio/video track	n/a	audio/video track
video-only track	n/a	video-only track
video-only track	audio-only track	audio/video track
n/a	audio-only track	audio-only track

All other combinations of video and audio elements result in unspecified behavior of the PartialImportWorkflowOperation.

Order of Video and Audio Elements

Within a sequence (seq), the video elements most occur in ascending order considering the values of their attributes begin. The same holds for audio elements. Note the video and audio elements are processed individually, so the order of occurrences of video and audio elements are independent from each other.

Important: The PartialImportWorkflowOperation will not process video or audio elements correctly if the order of appearance in the SMIL file is not correct.

Overlapping Partial Tracks

The behavior of overlapping partial tracks is unspecified, i.e. for a given element e (video or audio), the value of begin for the subsequent element (e+1) of the same type (video or audio) within the same sequence must be equal or greater than e.begin + e.dur, i.e. make sure that the following invariant holds: (e+1).begin >= e.begin + e.dur

Encoding Profiles

The PartialImportWorkflowOperation uses a number of encoding profiles to perform its processing. Some of the encoding profiles can be explicitly configured by the user, others are used implicitly in means of being hard-coded and are not supposed to be changed by the user.

Hard-coded Encoding Profiles

Encoding Profile	Description
import.preview	Extract the first frame of a given partial track
import.image-frame	Extract the last frame of a given partial track. Note that this profile is used to extract the exactly last frame of a partial track - not just a frame close to the last one. To make this work for video files with headers that don't contain the exact frame count, set accurate_frame_count to true in etc/org.opencastproject.inspection.ffmpeg.MediaInspectionServiceImpl.cfg
image-movie.work	Generate video partial tracks based on extracted images used to fill video gaps
import.silent	Generate silent audio tracks used to fill audio gaps

Configurable Encoding Profiles

Configuration Key	Description
concat-encoding-profile	Used to concatenate partial tracks into tracks
trim-encoding-profile	Used to trim the resulting concatenated single tracks if necessary
preencode-encoding-profile	Used to encode all source video tracks before any processing happens, to avoid errors with non-uniform input and prepare them for lossless concatenation. Must ensure same resolution, codec, framerate and samplerate. Must use the same framerate as the image-movie.work profile.
force-encoding-profile	Used to re-encode target tracks in case the file extension of a given target track is not included in required-extensions or the configuration key force-encoding is set to true

Missing Encoding Profiles

Some of the encoding profiles necessary for this operation are not included in Opencast per default, but the operation will not work without them (or with similar ones configured). If you want to use this operation. we recommend using the following encoding profiles by copy and pasting them in a .properties file in the etc/encoding folder of your installation.

# Generate silent audio tracks for filling gaps for partial import operation
profile.import.silent.name = Generate silent audio tracks for filling gaps
profile.import.silent.input = nothing
profile.import.silent.output = audio
profile.import.silent.suffix = -silent-audio.mp4
profile.import.silent.ffmpeg.command = -strict -2 -filter_complex aevalsrc=0:d=#{time} -c:a aac -b:a 8k -ar 44100 #{out.dir}/#{out.name}#{out.suffix}

# Extract last image for partial import operation
profile.import.image-frame.name = Extract last image
profile.import.image-frame.input = visual
profile.import.image-frame.output = image
profile.import.image-frame.suffix = -image.jpg
profile.import.image-frame.ffmpeg.command = -sseof -3 -i #{in.video.path} -update 1 -q:v 1 #{out.dir}/#{out.name}#{out.suffix}

# Extract image for partial import operation
profile.import.preview.name = Extract an image
profile.import.preview.input = visual
profile.import.preview.output = image
profile.import.preview.suffix = -image.jpg
profile.import.preview.ffmpeg.command = -ss #{time} -i #{in.video.path} -r 1 -frames:v 1 #{out.dir}/#{out.name}#{out.suffix}

# Trim a stream
#   This command will trim and input stream. Trimming will be fast, as no
#   re-encoding takes place. It will, however, not be frame accurate.
profile.trim.work.name = trim track
profile.trim.work.input = stream
profile.trim.work.output = visual
profile.trim.work.suffix = -trimmed.#{in.video.suffix}
profile.trim.work.ffmpeg.command = -ss #{trim.start} -i #{in.video.path} -t #{trim.duration} -c copy #{out.dir}/#{out.name}#{out.suffix}

# Used by Partial Import operation to encode tracks into equal formats
profile.encode.partial-import.name = editor
profile.encode.partial-import.input = audiovisual
profile.encode.partial-import.output = audiovisual
profile.encode.partial-import.suffix = -editor.mp4
profile.encode.partial-import.mimetype = video/mp4
profile.encode.partial-import.ffmpeg.command = -i #{in.video.path} \
  -filter:v crop=trunc(iw/2)*2:trunc(ih/2)*2,fps=25 -shortest -c:v libx264 -preset superfast -pix_fmt yuv420p -crf 18 -c:a aac -b:a 196k \
  #{out.dir}/#{out.name}#{out.suffix}

# Preencode
#   Used by partial import to make all videos have the same resolution,
#   codec, framerate and samplerate to allow for usage of the concat
#   demuxer.
#   MUST specify the same framerate as image-movie.work
profile.partial-import-preencode.name = partial-import-preencode
profile.partial-import-preencode.input = audiovisual
profile.partial-import-preencode.output = audiovisual
profile.partial-import-preencode.suffix = -preencoded.mp4
profile.partial-import-preencode.mimetype = video/mp4
profile.partial-import-preencode.ffmpeg.command = -i #{in.video.path} \
  -filter:v scale=1920:-2,fps=30 \
  -shortest -c:v libx264 -pix_fmt yuv420p \
  -c:a aac -b:a 196k \
  -ar 44100 \
  #{out.dir}/#{out.name}#{out.suffix}