InfluxDB Statistics Provider

Architecture

A complete setup consists of the following components:

For example, using Opencast's opencast-influxdb-adapter, your architecture would look like this:

graph LR Webserver[Webserver Logs] --> Adapter --> InfluxDB --> Opencast;

Precisely, the Opencast bundle opencast-statistics-provider-influx is the one that needs to be able to connect to InfluxDB using http(s). So the node hosting this bundle needs network access to InfluxDB.

Configuration

Before configuring Opencast, you should have a running InfluxDB instance and should think about how you want your data to be written to InfluxDB and what your InfluxDB database schema should look like. Specifically, you should think about retention policies, measurement names, field/tag names and how much you want to downsample your data. If you don't have any data in your InfluxDB, but want to verify your setup is working, there is some test data provided in the section Verifying Your Setup.

InfluxDB Access

Opencast needs to know how to talk to your InfluxDB instance. Therefore, you should edit the configuration file etc/org.opencastproject.statistics.provider.influx.StatisticsProviderInfluxService.cfg and fill in your influx URI, username, password, and database name.

InfluxDB Statistics Provider Configuration Files

For each provider, the following properties have to be configured:

Here is an example json configuration for a provider which generates charts for episodes showing the number of views:

etc/statistics/influx.views.episode.sum.json

{
  "id": "episode.views.sum.influx",
  "title": "STATISTICS.TITLE.VIEWS_SUM",
  "description": "STATISTICS.DESCRIPTION.VIEWS_SUM",
  "resourceType": "EPISODE",
  "sources": [{
    "measurement": "infinite.impressions_daily",
    "aggregation": "SUM",
    "aggregationVariable": "value",
    "resourceIdName": "episodeId",
    "resolutions": [
      "DAILY",
      "WEEKLY",
      "MONTHLY",
      "YEARLY"
    ]
  }],
  "type": "timeseries"
}

Using the runningtotal provider

The runningtotal statistics provider is a special type of time series statistics provider. To illustrate what it can be used for, let’s assume we want to track the number of hours of videos per organization (this is actually what the provider was initially designed for). We create a JSON file for the provider as such:

{
  "id": "organization.publishedhours.influx",
  "title": "STATISTICS.TITLE.PUBLISHEDHOURS",
  "description": "STATISTICS.DESCRIPTION.PUBLISHEDHOURS",
  "resourceType": "ORGANIZATION",
  "sources": [{
    "measurement": "infinite.publishedhours",
    "aggregation": "SUM",
    "aggregationVariable": "hours",
    "resourceIdName": "organizationId",
    "resolutions": [
      "DAILY",
      "WEEKLY",
      "MONTHLY",
      "YEARLY"
    ]
  }],
  "type": "runningtotal"
}

Note that the published hours entries can be negative, in case we retract a video.

When the runningtotal provider is asked to report on, for example, the monthly hours of video for a specific year, it will first take the sum of all video lengths up until that year. Then, for each month, it will take the sum of all the entries in that month, and add it to the previous value. And so on for the next months.

To actually write these hours to the statistics data base, you have to add the statistics-writer workflow operation handler to your workflows. Specifically, somewhere in your publishing workflow, you have to add an entry such as this:

  - id: statistics-writer
    fail-on-error: true
    exception-handler-workflow: partial-error
    description: Collect video statistics
    configurations:
      - flavor: presenter/video
      - retract: false
      - measurement-name: publishedhours
      - organization-resource-id-name: organizationId
      - length-field-name: hours
      - temporal-resolution: hours

To decrement the running total of hours in the case of retractions, set the retract property to true. In the default case, or when the retract property is false the running total is not decremented when a retraction occurs.

Verifying Your Setup

If you want to test your setup, you can put the following test data into InfluxDB and check if Opencast displays all charts correctly. First, create a series and an event as part of that series using the Opencast Admin UI. Second, copy the test data to a file called testdata.txt and edit it to match your InfluxDB database schema. Make sure you replace the episodeId, seriesId, and organizazionId tag value with the correct identifiers of the test event/series you just created. Also make sure, that the tag names (e.g.) episodeId and the field name (value) match the ones you have specified in the source strings of your providers. Also, the database name, retention policy name and measurement name have to match your configuration.

The InfluxDB test data could look like this:

# DDL

CREATE DATABASE opencast

# DML

# CONTEXT-DATABASE: opencast

impressions_daily,episodeId=6d3004a3-a581-4fdd-9dab-d4ed02f125f8,seriesId=5b421e3c-56a5-4c9e-86cd-bedcfa739cfa,organizationId=mh_default_org value=1 1554468810
impressions_daily,episodeId=6d3004a3-a581-4fdd-9dab-d4ed02f125f8,seriesId=5b421e3c-56a5-4c9e-86cd-bedcfa739cfa,organizationId=mh_default_org value=1 1554555210
impressions_daily,episodeId=6d3004a3-a581-4fdd-9dab-d4ed02f125f8,seriesId=5b421e3c-56a5-4c9e-86cd-bedcfa739cfa,organizationId=mh_default_org value=1 1554641610
impressions_daily,episodeId=6d3004a3-a581-4fdd-9dab-d4ed02f125f8,seriesId=5b421e3c-56a5-4c9e-86cd-bedcfa739cfa,organizationId=mh_default_org value=1 1554728010
impressions_daily,episodeId=6d3004a3-a581-4fdd-9dab-d4ed02f125f8,seriesId=5b421e3c-56a5-4c9e-86cd-bedcfa739cfa,organizationId=mh_default_org value=1 1554814410
impressions_daily,episodeId=6d3004a3-a581-4fdd-9dab-d4ed02f125f8,seriesId=5b421e3c-56a5-4c9e-86cd-bedcfa739cfa,organizationId=mh_default_org value=1 1554900810

The file format of the InfluxDB test data is described here.

You can import the test data into InfluxDB using the following command:

influx -import -path=testdata.txt -precision=s -database=opencast

Once you have imported your test data, you should be able to view the charts you have configured when accessing the event/series details of your test event or Opencast's statistics section.