Job Agents

On start, Job Agents spawn a configurable amount of Job Containers as separate system processes. Job Agents are responsible for redirecting incoming jobs from their assigned queue to the locally spawned Job Containers via REST communication. The spawned Job Containers serve solely as executing unit for Altair RapidMiner processes. Job Containers are kept alive as long as the Job Agent application runs and will automatically shut down when the managing Job Agent shuts down.

This page outlines how to configure a Job Agent. Please refer to the architecture page to read more about the Job Agent and Job Container structure overview.

Configuration

Agent Properties

You can alter the configuration of the Job Agent by setting the corresponding environment variable. The variables are described inside this table Job Agent Settings. Besides settings like the amount of spawned Job Containers or the maximum amount of used memory per Job Container, it is also possible to configure more complex behavior outlined in the following sub-sections.

Container ports

When Job Containers are spawned as separate system processes during the Job Agent's startup, they are bound to system ports. This is necessary because the Job Agent communicates via REST endpoint with them, e.g. to redirect jobs or to retrieve a job's latest status. The Job Agent will use successive ports beginning from the defined starting port. The last port is determined by the amount of configured Job Containers, e.g. if 10000 is defined as starting port and four Job Containers should be spawned in total, the Job Agent will bind the ports 10000, 10001, 10002 and 10003. Job Containers only listen locally and are unreachable from anywhere else than localhost/127.0.0.1.

Container restart policies

By default, Job Containers will run indefinitely and not restart after a job has been executed. With this behavior, it's possible to execute huge amounts of jobs nearly instantly. A possible downside is that jobs might have an effect on each other when run sequentially. To overcome this, it is possible to assign restart policies to Job Containers. Supported restart policies are: run indefinitely, terminate after a configurable amount of executed processes and restart on a regular basis via cron expression. When a restart is invoked, the currently active job execution will be finished before the job containers are restarted. To change this behavior, it's possible to set the JOBAGENT_RESTART_TIMEOUT environment variable. The Job Agent will then kill the Job Containers forcibly after the execution time exceeds this timeout regardless if it's still running.

Container caching for Projects

When a process from a Project is executed by a Job Container, the Job Container will first download the corresponding project files in order to use them during execution. After the process finishes, those temporary working files are deleted. If process execution changes project files, they are automatically added as a new Snapshot.

Because Job Containers need to download project files, it will take time for large projects and thus process execution time might increase due to this. In order to reduce this initial time to download files, each Job Container caches already downloaded Projects by applying a caching strategy. This behavior can be adjusted by changing the value of the JOBAGENT_CONTAINER_REPOSITORY_CACHING_STRATEGY variable. By default, a Job Container will keep two Projects in cache and replace the least recently used one when a new Project needs to be downloaded. For more details about which different strategies exist and how they can be configured to fit your needs, please have a look at the descriptions of the Job Agent Settings.

Graceful Job Agent shutdown

Job Agents will by default wait for all job executions to finish before shutting down. This can however be avoided by setting the Job Agent's JOBAGENT_SHUTDOWN_TIMEOUT.

Container Properties

You can add properties to a Job Container depending when you like to have them available. In general, Job Containers reference their properties in two different ways:

  1. on start and
  2. when a job is going to be executed.

On start

When a new Job Container has been spawned by a Job Agent, the execution context defined in {homeDir}/config/rapidminer/.RapidMiner is copied to the Job Container so that it can use it during execution. You can place your own configuration files into this directory if you need it for your extensions.

You can also use the central resource management to synchronize the execution context from the Server home folder.

It is also possible to add additional properties during Job Container Studio initialization. This is particularly useful if you need to provide extension properties which are already required during Job Container start, e.g. when operators are registered. To set these properties, you can use the JOBAGENT_CONTAINER_JVM_CUSTOM_OPTIONS environment variable, which is described in Container JVM arguments:

  • Use jobagent.container.init-with-properties.enabled to enable or disable, disabled by default
  • Use jobagent.container.init-with-properties.location to set an absolute location to a properties file, defaults to rapidminer-init.properties in the {homeDir}/config/rapidminer/ folder

Those property files are not automatically synchronized and might need to be adapted for each Job Agent instance you've deployed.

On queuing a new job

When you submit a job to a queue, it is picked up by the corresponding Job Agent responsible for this queue. Afterwards it is forwarded to a Job Container managed by this Job Agent via REST. Whenever this happens, the properties file {homeDir}/config/rapidminer/rapidminer.properties is read by the Job Agent and its contents are piped into the job so that the Job Container can use them as system properties and therefore they are also exposed to extensions during execution. Remember, that properties are overwritten for new jobs. This means that changing the file between executions results in different property values being propagated to the Job Container for different jobs.

This file can also be used to provide custom properties (e.g. extension properties) for a Job Container but which are not already required during Job Container start.

Container JVM arguments

Job Containers are started by their Job Agent with a default set of JVM arguments, e.g. something like XX:+UseG1GC.

To add additional arguments which will be transposed to the Job Container, set the JOBAGENT_CONTAINER_JVM_CUSTOM_OPTIONS environment variable and add new properties by specifying them similar to JOBAGENT_CONTAINER_JVM_CUSTOM_OPTIONS = -Dnew.property=new -Danother.property=another. This will transpose -Dnew.property=new -Danother.property=another to each Job Container spawned by a Job Agent.

Please notice that the entire value of the property JVM_CUSTOM_OPTIONS will be transposed to the Job Container start arguments. Any error in this might lead to the Job Container not spawning correctly anymore.

If necessary, it's also possible to override all default JVM arguments although we highly advise against it. In certain use cases this might still be feasible and needed. To override them you need to set the JOBAGENT_CONTAINER_JVM_PROPERTIES environment variable and define something similar to JOBAGENT_CONTAINER_JVM_PROPERTIES = Dtest.property1=test1,Dtest.property2=test2. Ensure that any default argument which you need is still present. All JVM default arguments are printed in the agent.log when the Job Agent starts.

Please notice that there are no leading hyphens and that properties are separated via comma for the JOBAGENT_CONTAINER_JVM_PROPERTIES property.

Container proxy usage

Proxy usage can be configured inside the Job Container using the rapidminer.properties file inside the $jaHomeDir/config/rapidminer folder. Property names are the same as when you configure a proxy inside the Altair AI Studio Proxy.

Extract necessary values from your ./RapidMiner/rapidminer-studio-settings.cfg configuration file, for example like the following.

rapidminer.proxy.mode=Manual proxy configuration
rapidminer.proxy.https.proxyHost=myproxy.domain.tld
rapidminer.proxy.https.proxyPort=8443

The Job Container will pick those values up and use them throughout execution.

If your proxy requires authentication, e.g. basic auth, proceed like you would in Altair AI Studio by using the Studio Wallet to manage your passwords. Afterwards, copy the .RapidMiner/encryption and .RapidMiner/credentials.xml files into the execution context folder {homeDir}/config/rapidminer/.RapidMiner of the Job Container. By default, the necessary files will be picked up during Job Container start. The default symmetric key used for that is expected to be {homeDir}/config/rapidminer/.RapidMiner/encryption/symmetric/default-local-context.rmek.

Resources

To enable correct execution of processes, the Job Agent uses various external resources like extensions, custom Java libraries, and RapidMiner Server licenses. These resources are stored within the {homeDir}/resources/ folder of the Job Agent and automatically populated with the synchronization service.