AWS S3 Archive Configuration

This page documents the configuration for the AWS S3 components in the Opencast module asset-manager-storage-aws.

This configuration is only required on the admin node, and only if you are using Amazon S3 as an archive repository.

Amazon User Configuration

Configuration of Amazon users is beyond the scope of this documentation, instead we suggest referring to Amazon's documentation. You will, however, require an Access Key ID and Secret Access Key. The user to which this key belongs requires the AmazonS3FullAccess permission, which can be granted using these instructions.

A free Amazon account will work for small scale testing, but be aware that S3 archiving can cost you a lot of money very quickly. Be aware of how much data and how many requests you are making, and be sure to set alarms to notify you of cost overruns.

Amazon Service Configuration

For development and testing it is generally safe to allow the Opencast AWS S3 Archive service to create the S3 bucket for you. It will create the bucket per its configuration, with private-only access to the files, and no versioning.

Opencast Service Configuration

The Opencast AWS S3 Archive service configuration can be found in the org.opencastproject.assetmanager.aws.s3.AwsS3AssetStore.cfg configuration file.

Key Description Default Example
org.opencastproject.assetmanager.aws.s3.enabled Whether to enable this service false
org.opencastproject.assetmanager.aws.s3.region The AWS region to set us-east-1
org.opencastproject.assetmanager.aws.s3.bucket The S3 bucket name example-org-archive
org.opencastproject.assetmanager.aws.s3.access.id Your access ID 20 alphanumeric characters
org.opencastproject.assetmanager.aws.s3.secret.key Your secret key 40 characters
org.opencastproject.assetmanager.aws.s3.endpoint The endpoint to use Default AWS S3 endpoint https://s3.service.com
org.opencastproject.assetmanager.aws.s3.path.style Whether to use path style false / Default AWS S3 style
org.opencastproject.assetmanager.aws.s3.max.connections Number of max connections 50
org.opencastproject.assetmanager.aws.s3.connection.timeout Connection timeout in ms 10000
org.opencastproject.assetmanager.aws.s3.max.retries Number of max retries 100

Using S3 Archiving

S3 archiving is done on a Snapshot level, that is a mediapackage ID + version. Because of the way that the Asset Manager handles snapshots, all newly created snapshots are always local. Creating a snapshot of a mediapackage with non local data will download all related snapshots for that mediapackage which can incur significant costs. S3 archiving is meant to be a cost reduction, and storage expansion tool, rather than hot storage where lots of reads and writes will occur. Therefore, most adopters do not want to immediately (ie, at the end of your default workflow) offload your recordings to S3! Instead, we suggest using the TimedMediaArchiver as configured in /etc/org.opencastproject.assetmanager.impl.TimedMediaArchiver.cfg to offload your recordings after sufficient time that further modification of the recording is unlikely.

If you do need to create an additional workflow, a substantially better approach than restoring snapshots involves using the ingest-download workflow operation handler to download the relevant file(s) to the local workspace. This dramatically speeds up snapshotting, and allows the operations which require local files to work properly without having to restore everything, and then re-archive to S3.

Manual S3 Archiving

Manually moving assets to and from S3 is done via a workflow operation handler added as part of a workflow. The workflow operation handler definition looks like this

    <operation
      id="move-storage"
      description="Offloading to AWS S3">
      <configurations>
        <configuration key="target-storage">aws-s3</configuration>
      </configurations>
    </operation>

Assets in S3 continue to be accessible to Opencast, however there may be cases where you wish to restore your content back to your local storage. This can be accomplished using the same workflow operation definition as above, and changing the target-storage configuration value from aws-s3 to local-filesystem like so

    <operation
      id="move-storage"
      description="Restoring from AWS S3">
      <configurations>
        <configuration key="target-storage">local-filesystem</configuration>
      </configurations>
    </operation>

S3 Storage Tiers

S3 supports storage tiering, which can offer significant cost savings in return for substantially increased access times. Opencast does not directly expose this functionality in the UI, but support is present in the back end. Both manual, and Lifecycle based storage tiering are supported, as are all tiers. Attempting to retrieve an asset will trigger a restore to the standard S3 tier, if appropriate, and then return the file contents. For more on cold storage, see below.

S3 Glacier Flexible Retrieval and Deep Archive

The Glacier FR and Deep Archive (known as Cold Storage going forward) storage classes are supported, however they have a significant drawback at this point: Opencast does not understand that these files are not immediately accessible. Attempts to process a workflow containing assets will trigger a restore of the file, but then likely fail the workflow after some time. This because some Opencast configurations use HTTP(S) downloading to transfer the files between processing nodes, and those transfers will time out when access times for the media files are measured in hours. Note that these failures will not harm your Opencast system, but they will cost you money because of the potentially wasted restores.

A better approach is to use the REST endpoints at /assets/aws/s3, eg http://stable.opencast.org/assets/aws/s3. Specifically, you probably want to use PUT glacier/{mediaPackageId}/assets, which enables temporary restoration of files from the Cold Storage tiers to standard S3. AWS will automatically remove the temporary copy after the specified duration, and that should be long enough for your workflow to complete.

Permanently Restoring Content

Rarely you will need your content restored to S3 on a more permanent basis. In this case you need to temporarily restore as above, and the once the restore is complete use the POST {mediaPackageId}/assets endpoint to permanently move the asset. Note that the permanently restored asset may still be re-Glaciered by any active AWS Object Lifecycle rules.

Manually Changing Content Storage Tiers

While AWS Lifecycle rules are a much more scalable solution, there may be times when you wish to manually alter an asset's storage tier. In the same way that you can permanently restore an asset, you can also manually move assets betwen storage classes using the POST {mediaPackageId}/assets.