Install Across Multiple Servers
Note that this is not a comprehensive guide of all possible ways to install Matterhorn. It is more like a guide to good practice and presents what a lot of people are running.
Step 1: Install Matterhorn
For a distributed set-up you basically only need to put the right modules onto the right node in the Matterhorn system. To make things less complicated, these modules are grouped together as profiles which you can directly build and install.
If you want to build Matterhorn yourself, you can invoke the build process for certain modules by using mavens -P
option. For example the following command will build the three profiles called worker-standalone, serviceregistry and
workspace (These are the profiles needed for a worker node):
mvn clean install -DdeployTo=/path/to/matterhorn/ \
-Pworker-standalone,serviceregistry,workspace
If you are using the Matterhorn RPM repository instead, you can do the same by installing the profile packages like this:
yum install opencast-matterhorn14-profile-worker-standalone \
opencast-matterhorn14-profile-serviceregistry \
opencast-matterhorn14-profile-workspace
To make things easier, the repository also contains a set of predefined distribution packages which will automatically install all dependencies for a given node type. For example, to install a Matterhorn worker node:
yum install opencast-matterhorn14-distribution-worker
This is the general idea behind a distributed set-up of Matterhorn. The following list will now give a list of examples about how you could distribute Matterhorn over a given set of machines and what you need to install for that. You should be aware that these examples are not the only possible ways of setting up Matterhorn. They are, however, a good way to start.
What is not specified in this list is the location of the database and the storage server. You can place them either on one of the Matterhorn nodes or create a dedicated machine for them. The latter will obviously give you more performance.
All-In-One
This is the default set-up described in the basic installation guides. It works fine for testing purposes should, however, not be used in production. It is not distributed but is listed here to have a comprehensive list of necessary profiles. For an All-In-One system the following profiles need to be installed:
admin, dist, engage, worker, workspace, serviceregistry, directory-db
Maven build command:
mvn clean install -DdeployTo=/path/to/matterhorn/
RPM Repository installation:
yum install opencast-matterhorn14-distribution-default
Two-Server Set-up
This set-up is the minimum set-up recommended for productive use. It will separate the distribution layer from the administrative and working layer. This means that even if one server is under heavy load as videos are processed, etc. it will not effect the distribution and users should still be able to watch videos smoothly. However, it might happen that under heavy load the handling of the administrative ui gets a bit rough.
Necessary profiles to build:
admin-worker: admin,workspace,dist-stub,engage-stub,worker,serviceregistry
engage: engage-standalone,serviceregistry,dist-standalone,workspace
Maven build commands:
# admin-worker
mvn clean install -DdeployTo=/path/to/matterhorn/ \
-Padmin,workspace,dist-stub,engage-stub,worker,serviceregistry
# engage
mvn clean install -DdeployTo=/path/to/matterhorn/ \
-Pengage-standalone,serviceregistry,dist-standalone,workspace
RPM Repository installation:
# admin-worker
yum install opencast-matterhorn14-distribution-admin-worker
# engage
yum install opencast-matterhorn14-distribution-engage
Three (or more) Server Set-up
While in the last example we have created one combined node for both the administrative tools and the workers, in this example we will split this node into dedicated worker and admin nodes. Using this set-up it is easy to increase the systems performance simply by adding further worker nodes to the system.
Necessary profiles to build:
admin: admin,workspace,dist-stub,engage-stub,worker-stub,serviceregistry
worker: serviceregistry,workspace,worker-standalone
engage: engage-standalone,serviceregistry,dist-standalone,workspace
Maven build commands:
# admin
mvn clean install -DdeployTo=/path/to/matterhorn/ \
-Padmin,workspace,dist-stub,engage-stub,worker-stub,serviceregistry
# worker
mvn clean install -DdeployTo=/path/to/matterhorn/ \
-Pserviceregistry,workspace,worker-standalone
# engage
mvn clean install -DdeployTo=/path/to/matterhorn/ \
-Pengage-standalone,serviceregistry,dist-standalone,workspace
RPM Repository installation:
# admin
yum install opencast-matterhorn14-distribution-admin
# worker
yum install opencast-matterhorn14-distribution-worker
# engage
yum install opencast-matterhorn14-distribution-engage
Step 2: Set-Up NFS Server
Though it is possible to have Matterhorn run without shared storage, it is still a good idea to do so, as hard links can be used to link files instead of copying them and not everything has to be tunneled over HTTP.
Thus you should first set-up your NFS server. The best solution is certainly to have a dedicated storage server. For smaller set-ups, however, it can also be put on one of the Matterhorn nodes, i.e. on the admin node.
To do this, you first have to install and enable the NFS server:
yum install nfs-utils nfs-utils-lib
chkconfig --level 345 nfs on
service nfs start
You want to have one common user on all your systems, so that file permissions do not become an issue.. As preparation for this it makes sense to manually create a matterhorn user and group with a common UID and GID:
groupadd -g 1234 matterhorn
useradd -g 1234 -u 1234 matterhorn
If the user and group id 1234
is already used, just pick another one but make sure to pick the same one on all your
Matterhorn nodes.
Then create the directory to be shared and set its ownership to the newly created users:
mkdir -p /srv/matterhorn
chown matterhorn:matterhorn /srv/matterhorn
Next we actually share the storage dir. For this we need to edit the file /etc/exports
and set:
/srv/matterhorn 131.173.172.190(rw,sync,no_subtree_check)
with 131.173.172.190 being the IP address of the other machine that should get access. Finally we enable the share with:
exportfs -a
Of cause you have to open the necessary ports in your firewall configuration. For iptables, appropriate rules could be for example:
-A INPUT -m state --state NEW -p tcp -m multiport --dport 111,892,2049,32803 -j ACCEPT
-A INPUT -m state --state NEW -p udp -m multiport --dport 111,892,2049,32803 -j ACCEPT
You can set them by editing /etc/sysconfig/iptables
and restarting the service afterwards.
Now you have set-up your storage server. What is still left to do is to mount the network storage on all other servers
of the matterhorn clusters except the capture agents. To do that you need to edit the /etc/fstab
on each server and
add the command to mount the network storage on startup:
storageserver.example.com:/srv/matterhorn /srv/matterhorn nfs rw,hard,intr,rsize=32768,wsize=32768 0 0
Important: Do not use multiple NFS shares for different parts of the Matterhorn storage dir. Matterhorn will check if hard links are possible across in a distributed set-up, but the detection may fail if hard links are only possible between certain parts of the storage. This may lead to failures.
Step 3: Set-Up the Database
First make sure to follow the regular database set-up.
Do not forget to set the user also for the remote servers and grant them the necessary rights. Additionally, you need to configure your firewall:
-A INPUT -p tcp -s 131.173.172.190 --dport 3306 -m state --state NEW,ESTABLISHED -j ACCEPT
Step 4: Set-Up ActiveMQ
Since version 2, Opencast Matterhorn requires an Apache ActiveMQ message broker as message relay for the administrative user interface. ActiveMQ can either be set up to run on its own machine or on one of the existing Matterhorn nodes (usually the admin node).
ActiveMQ 5.10 or above should work. ActiveMQ 5.6 will not work. Versions in between are untested.
Installation
- If you use the Matterhorn RPM repository, simply install the
activemq-dist
package. - If you are running RHEL, CentOS or Fedora you can use the ActiveMQ-dist Copr RPM repository
- You can download binary distributions from the Apache ActiveMQ website
Configuration
What you basically need to do is to point all your Matterhorn nodes to your message broker. For more information about the configuration, have a look at the Message Broker Set-Up Guide.
Do not forget that ActiveMQ uses TCP port 61616 (default configuration) for communication which you might have to allow in your firewall.
Step 5: Configure Matterhorn
You did already set-up and configured your database and message broker in the last steps, but there is some more configuration you have to do. First of all you should follow the Basic Configuration guide which will tell you how to set the login credentials etc. After that continue with the following steps:
config.properties
Set the server URL to the public url of each server (admin URL on admin, worker URL on worker, engage URL on engage, …). This may either be this nodes IP address or preferable its domain name:
org.opencastproject.server.url=http://<URL>:8080
Set the location of the shared storage directory:
org.opencastproject.storage.dir=/srv/matterhorn
Define that the file repository shall access all files locally:
org.opencastproject.file.repo.url=${org.opencastproject.admin.ui.url}
load/org.opencastproject.organization-mh_default_org.cfg
Set the base URL of the server hosting the administrative tools. Again use a domain name instead of an IP address if possible:
org.opencastproject.admin.ui.url=http://<ADMIN-URL>:8080
Set the base URL of the server hosting the engage tools:
org.opencastproject.engage.ui.url=http://<ENGAGE-URL>:8080
services/org.opencastproject.serviceregistry.impl.ServiceRegistryJpaImpl.properties
To ensure that jobs are not dispatched by non-admin nodes you may also want to set:
dispatchinterval=0