Categories

Versions

You are viewing the RapidMiner Legacy documentation for version 9.8 - Check here for latest version

Installation guide

See the deployment documentation to learn how to deploy RapidMiner as a High Availability cluster.

The recommended method is to use Kubernetes. The documentation below is provided in case you prefer a non-Kubernetes solution.

In this guide we'll run you through installing RapidMiner Server as a High Availability cluster in a Linux environment. It covers installing RapidMiner Server High Availability for the first time, with no existing data.

Terminology

In this guide we'll use the following terminology:

  • Installation directory - is the directory where you installed RapidMiner Server on a node.
  • Shared home directory – The RapidMiner Server home directory that is accessible to all nodes in the cluster via the same path.

Test RapidMiner Server High Availability installation

Be sure to test your RapidMiner Server High Availability installation thoroughly before deploying to production.

  • Set up and test RapidMiner Server High Availability in your staging environment before deploying to a production environment.
  • Test RapidMiner Server High Availability with identical data (repositories, users, extensions) to your production instance.

Accessing a RapidMiner Server High Availability installation

When the installation is completed, the URL of RapidMiner Server will be the URL of the load balancer; this machine should be identified as RapidMiner Server by the DNS. The remaining machines do not need to be publicly accessible to your users.

Provision the shared database, shared filesystem, and ActiveMQ broker

Provision the shared database

Set up the shared database server and make sure that your database allows enough concurrent connections. With many RapidMiner Server nodes connecting to the same database the default connection limit might be quickly exceeded. For PostgreSQL, for example, the default limit is 100 connections. To increase the limit, edit the postgresql.conf file and increase the value of max_connections, then restart PostgreSQL.

Provision the shared filesystem

Set up the shared NFS filesystem and make sure RapidMiner Server nodes can access it and have full read and write permissions.

Provision ActiveMQ broker

Although the RapidMiner Server cluster will function with a single instance of ActiveMQ, we highly recommend clustering it as well, because high availability depends on each component being highly available. You don't want ActiveMQ to be the single point of failure. For the sake of completeness, both a single-node setup and a clustered setup are outlined below.

Single node ActiveMQ setup

  • Download and install ActiveMQ.

Currently only ActiveMQ version 5.14.5 has been tested and is officially supported but feel free to test more updated 5.x versions.

If you’re using GNU/Linux ActiveMQ packages should be provided by your distribution. You can easily install them with your package manager and start the application with the help of a system daemon like initd or systemd.

  • Configure the ActiveMQ broker user that will be used by RapidMiner Server and the Job Agents:

    • Open <activemq-conf-dir>/users.properties and add a new broker user and password (e.g., the user "brokerUser" with password "brokerP4ssw0rd"):

      admin=admin
      brokerUser=brokerP4ssw0rd
      
    • Open <activemq-conf-dir>/groups.properties and add the new user to the users group:

      admins=admin
      users=brokerUser
      
  • Write down the new user's credentials. They are needed to configure the connection from RapidMiner Server and from the Job Agents to the broker.

  • Start ActiveMQ.

Clustered ActiveMQ setup

  • Download and install ActiveMQ on all your machines serving as ActiveMQ instances.
  • Install the ActiveMQ instances on every machine. To do so, follow any setup described here.
    • It is advised to use the Shared File System Master Slave setup as your clustered setup already has a shared file system for the RapidMiner Server home directory.
    • Please make sure that all instances share the same broker user credentials (see "Single node ActiveMQ setup" on how to setup credentials)
  • Start all instances.

Prepare a headless installation

To install RapidMiner Server on the nodes we will use the headless installation option. A detailed description is given on the headless installation documentation page. However here's a short overview on how to prepare the headless installation:

  1. Download the RapidMiner Server installer on a machine with a UI
  2. Start the installer and choose the "Install RapidMiner Server on a headless machine" option
  3. Go through the installer steps and use configuration values appropriate for the clustered setup of RapidMiner Server
    1. Use the reachable hostname/IP address load_balancer_address of the load balancer for the server host name
    2. Make sure to disable bundled Job Agents
    3. Do not enable the Radoop proxy
  4. Finally, generate the installation XML file and store it on your disk. This file will be used to install RapidMiner Server on the nodes.

Prepare the first RapidMiner Server node

  1. Provision the infrastructure of the first RapidMiner Server node. You can automate this by using a configuration management tool such as Chef or Puppet or by spinning up identical virtual machine snapshots.
  2. Make sure the filesystem of your RapidMiner Server node supports UTF-8. If not add the following statement to the /etc/environment configuration file:

    LC_ALL=en_US.UTF-8
    LANG=en_US.UTF-8
    
  3. Mount the shared home directory.

    • For example, let's assume your RapidMiner Server home directory is /var/rapidminer/application-data/rapidminer-server/ and your shared home directory is available as an NFS export called rapidminer-san:/rapidminer-server-home . Add the following line to /etc/fstab on each cluster node:

      rapidminer-san:/rapidminer-server-home /var/rapidminer/application-data/rapidminer-server/ nfs lookupcache=pos,noatime,intr,rsize=32768,wsize=32768 0 0
      
    • Then mount it:

      mkdir -p /var/rapidminer/application-data/rapidminer-server/
      sudo mount -a
      
  4. Make sure all nodes have synchronized clocks and identical timezone configuration. Here are some examples for how to do this:

    • Red Hat Enterprise Linux or CentOS:

      sudo yum install ntp
      sudo service ntpd start
      sudo tzselect
      
    • Ubuntu:

      sudo apt-get install ntp
      sudo service ntp start
      sudo dpkg-reconfigure tzdata
      

Install RapidMiner Server on the first node

Once the infrastructure for the first RapidMiner Server node is available and meets all the node requirements, you can start installing RapidMiner Server.

Install RapidMiner Server

  1. Download the RapidMiner Server installer and extract it
  2. Upload the headless installation XML file to the node
  3. Run the headless installation:

    cd <rapidminer-server-installer>
    ./bin/rapidminer-server-installer <file_name>.xml
    

Adapt configuration

After the installation has finished you need to adapt a few configurations to configure RapidMiner Server for High Availability.

  1. First adapt the execution.properties configuration file to enable the cluster mode. The file can be found in the <shared home>/configuration/ folder.

    1. Enable clustered mode for RapidMiner Server via

      rapidminer.server.isClustered = true
      
    2. Configure the load balancer URL as the RapidMiner Server URL like this

      rapidminer.server.protocol = http
      rapidminer.server.host = <load_balancer_address>
      rapidminer.server.port = <port>
      
    3. Disable the embedded ActiveMQ broker and point to the external broker like this:

      jobservice.queue.activemq.embeddedBroker.enabled = false
      jobservice.queue.activemq.uri = failover:(tcp://172.31.21.116:61616,tcp://172.31.21.112:61616)
      jobservice.queue.activemq.username = brokerUser
      jobservice.queue.activemq.password = brokerP4ssw0rd
      
  2. Next update scheduler.properties configuration file to enabled a clustered scheduler. The config file is located in the same folder as the execution.properties file. Add following lines:

    org.quartz.jobStore.isClustered = true
    org.quartz.jobStore.clusterCheckinInterval = 10000
    
  3. Edit the standalone.conf file located in the <install directory>/bin/ folder.

    1. Look for

      JAVA_OPTS="$JAVA_OPTS -Djboss.server.log.dir=$RAPIDMINER_SERVER_HOME/log"
      

      and change it to a new log folder that matches the instance name. For example:

      JAVA_OPTS="$JAVA_OPTS -Djboss.server.log.dir=$RAPIDMINER_SERVER_HOME/log/instance1"
      
    2. Also, add a new line that points the Execution Backend to the localhost right next to the other JAVA_OPTS lines. For example:

      JAVA_OPTS="$JAVA_OPTS -Dexecution-backend-url=http://localhost:8080/executions"
      
  4. Add the RapidMiner Server node to the load balancer

  5. Start the first RapidMiner Server node
  6. Open Web UI of RapidMiner Server at http(s)://<load_balancer_address>:<port> and login as admin
  7. Make sure everything works fine (e.g. extensions are loaded, server logs can be inspected, etc.)

Install additional RapidMiner Server nodes

Once the first RapidMiner Server node is up and running, you can add more nodes to the cluster. There are two ways you can add more nodes: either manually or with a snapshot of the first node. Both are described below. The manual option requires a little more effort though.

Add nodes manually

To add nodes manually:

  1. Provision the infrastructure for additional modes, and then repeat the headless installation steps described in the section above.
  2. You do not need to adapt the whole configuration again. But unfortunately each RapidMiner Server headless installation overwrites the shared configuration folder of the initial installation. Please go to the <shared home> folder and restore the backup configuration every time the headless installation has finished. For example:

      cd <rapidminer-server-installer>
      ./bin/rapidminer-server-installer <file_name>.xml
    
      ###
      # wait for installation to finish
      ###
    
      cd /var/rapidminer/application-data/rapidminer-server/
    
      # delete newly created configuration and replace initial config
      rm -rf configuration/
      mv configuration_backup_9.1.0_2018-11-08_14-40-42/ configuration/
    
  3. Configure a new log folder in the file <install directory>/bin/standalone.conf, as described in the section above.

  4. Once the installation is finished and the initial configuration is restored, you can make the new node available as an endpoint by adding the IP address and port 8080 to the loadbalancer.
  5. Start the new RapidMiner Server node

Add nodes from snapshot

If you are running RapidMiner Server in a virtual infrastructure or in the Cloud, we recommend creating a snapshot of the initial node, then adding new nodes from the snapshot.

To do so:

  1. Shutdown RapidMiner Server on the initial node
  2. Create a snapshot of the virtual instance
  3. Restart the initial RapidMiner Server node once the snapshot has been created
  4. Create a new node from the just created snapshot
  5. SSH to the new cluster node and configure a new log folder in the <install directory>/bin/standalone.conf file as described in the section above.
  6. Add the new node to the load balancer
  7. Start new RapidMiner Server node

Install Job Agents

Each Job Agent should be installed on a dedicated machine. You can download the Job Agent ZIP file from RapidMiner Server's web interface, or you can call the REST API. We recommend the second approach, because you don't have to upload the ZIP file via SSH to your dedicated Job Agent machine. Using the second approach, proceed as follows:

  1. SSH to your machine on which the JobAgent will run.
  2. To download the JobAgent ZIP file:

    1. Obtain a token (value of the idToken field) which is eligible to access the download JobAgent route, e.g. the admin user:

      curl -u admin:PASSWORD http(s)://<load_balancer_address>:<port>/api/rest/tokenservice
      
    2. Download the ZIP for a queue QUEUENAME. The default queue is named DEFAULT. Be aware that names are case sensitive.

      curl -H "Authorization: Bearer TOKEN_FROM_REQUEST_ABOVE" http(s)://<load_balancer_address>:<port>/executions/queues/QUEUENAME/agent --output /path/to/save/location/JobAgent.zip
      
  3. Unzip the ZIP file to your preferred location. For example:

     unzip /path/to/save/location/JobAgent.zip -d /path/to/extract/location
    
  4. Adjust properties in the home/config/agent.properties file to your needs. The ActiveMQ broker URI should point to your ActiveMQ cluster which you've already configured in the execution.properties file of the shared RapidMiner Server home directory. The uri property represents a set of available ActiveMQ instances with their default port 61616. For example:

    jobagent.queue.activemq.uri = failover:(tcp://172.31.21.116:61616,tcp://172.31.21.112:61616)
    jobagent.queue.activemq.username = brokerUser
    jobagent.queue.activemq.password = brokerP4ssw0rd
    
  5. (Optional) Add extensions or JDBC drivers.

  6. Start the JobAgent.

Congratulations!

That's it! RapidMiner Server is accessible in High Availability mode from a URL like this: http(s)://<load_balancer_address>:<port>