You are viewing the RapidMiner Deployment documentation for version 9.8 - Check here for latest version
Hadoop connectivity template
This template, while very similar to the basic production template, becomes relevant when the goal is to deploy RapidMiner processes that leverage big data from a Hadoop cluster by using RapidMiner Radoop. We offer the Radoop Proxy component to make network configuration easier in cases where the Hadoop cluster is behind a firewall.
Use it to deploy RapidMiner AI Hub on Kubernetes, with the following components:
- 1 RapidMiner AI Hub instance
- 3 RapidMiner Job Agents
- Postgres database
- Platform Admin
- Radoop Proxy
- 1 KeyCloak instance
For a detailed description of every Docker image, see the image reference.
System requirements
Minimum recommended hardware configuration
The amount of memory needed depends heavily on the amount of data that will be processed by RapidMiner AI Hub. If most or all of the data is going to be processed in the Hadoop environment using Radoop, then 16GB is enough for the Server. If non-Radoop processes are going to be run in Server, then the recommendation is to increase the memory size to 32GB or more depending on the size of user data.
Each virtual or physical machine should at least have:
- Quad core
- 16GB RAM
- >20GB free disk space
Instructions
The provided Docker Images are ready to deploy to any Kubernetes Cluster.
Please review the configuration below according to your environment and requirements.
The following guide requires a running Kubernetes cluster.
Rapidminer Platform is supported on the following Kubernetes services:
- Amazon Managed Kubernetes Service (Amazon EKS)
- Azure Kubernetes Service (AKS)
- MiniKube (Please read the Notices about minikube)
- MicroK8S
Volumes
Volumes provides the Elastic Block Storage for the RapidMiner Platform components to store the data permanently during container life-cycle.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rm-postgresql-pvc labels: app: rm-postgresql-svc spec: accessModes: - ReadWriteOnce resources: requests: storage: 2Gi --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pem-uploaded-pvc labels: app: pem-uploaded-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rm-server-home-pvc labels: app: rm-server-svc spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rapidminer-uploaded-pvc labels: app: rapidminer-uploaded-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 100M --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: radoop-proxy-pvc labels: app: radoop-proxy spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi
Services
Services are the essential parts of the RapidMiner Platform. The services are used by containers/pods for reaching each other.
kind: Service apiVersion: v1 metadata: name: rapidminer-server-amq-svc labels: app: rapidminer-server-amq-svc role: server spec: ports: - port: 5672 targetPort: amq selector: app: rm-server-svc role: server --- kind: Service apiVersion: v1 metadata: name: rm-proxy-svc labels: app: rm-proxy-svc role: proxy spec: ports: - name: proxyhttp protocol: TCP port: 80 targetPort: proxyhttp - name: proxyhttps protocol: TCP port: 443 targetPort: proxyhttps selector: app: rm-proxy-svc role: proxy type: LoadBalancer --- kind: Service apiVersion: v1 metadata: name: postgres-svc labels: app: rm-postgresql-svc spec: ports: - port: 5432 targetPort: postgresport selector: app: rm-postgresql-svc --- kind: Service apiVersion: v1 metadata: name: rm-server-svc labels: app: rm-server-svc role: server spec: ports: - port: 8080 targetPort: rmswebui selector: app: rm-server-svc role: server --- kind: Service apiVersion: v1 metadata: name: pem-webui-svc labels: app: pem-webui-cron role: pem spec: ports: - name: pem-webuiport port: 82 protocol: TCP targetPort: pem-webuiport selector: app: rm-proxy-svc role: proxy --- kind: Service apiVersion: v1 metadata: name: radoop-proxy-svc labels: app: radoop-proxy role: radoop-proxy spec: ports: - name: radoop-proxy-port port: 1081 protocol: TCP targetPort: radoop-proxy-port selector: app: radoop-proxy-svc role: proxy
Database
Database is used by RapidMiner Server.
kind: Pod apiVersion: v1 metadata: name: rm-postgresql-svc labels: app: rm-postgresql-svc spec: containers: - name: rm-postgresql-svc image: postgres:9.6 ports: - name: postgresport containerPort: 5432 env: - name: POSTGRES_DB value: rmsdb - name: POSTGRES_USER value: rmsdbuser - name: POSTGRES_PASSWORD value: rmsdbpassword volumeMounts: - name: pgvolume mountPath: /var/lib/postgresql/data subPath: postgres volumes: - name: pgvolume persistentVolumeClaim: claimName: rm-postgresql-pvc
RapidMiner Server
The main component of the RapidMiner Platform.
kind: Pod apiVersion: v1 metadata: name: rm-server-svc labels: app: rm-server-svc role: server spec: hostname: rm-server-svc containers: - name: rapidminer-server image: rapidminer/rapidminer-server:9.6.0 ports: - name: rmswebui containerPort: 8080 - name: amq containerPort: 5672 env: - name: JOBSERVICE_QUEUE_ACTIVEMQ_USERNAME value: amq-user - name: JOBSERVICE_QUEUE_ACTIVEMQ_PASSWORD value: amq-pass - name: JOBSERVICE_AUTH_SECRET value: c29tZS1hdXRoLXNlY3JldAo= - name: DBHOST value: postgres-svc - name: DBSCHEMA value: rmsdb - name: DBUSER value: rmsdbuser - name: DBPASS value: rmsdbpassword volumeMounts: - name: rm-server-home-pvc mountPath: /persistent-rapidminer-home subPath: rapidminer-home volumes: - name: rm-server-home-pvc persistentVolumeClaim: claimName: rm-server-home-pvc
Job-Agent
The worker which perform the computation tasks.
kind: Deployment apiVersion: apps/v1 kind: Deployment metadata: name: rm-server-job-agent-svc labels: app: rm-server-job-agent-svc role: execution spec: replicas: 3 selector: matchLabels: app: rm-server-job-agent-svc template: metadata: labels: app: rm-server-job-agent-svc role: execution spec: containers: - name: rm-server-job-agent-svc image: rapidminer/rapidminer-execution-jobagent:9.6.0 env: - name: RAPIDMINER_SERVER_HOST value: rapidminer-server-svc - name: RAPIDMINER_SERVER_PORT value: '8080' - name: JOBAGENT_QUEUE_ACTIVEMQ_URI value: failover:(tcp://rapidminer-server-amq-svc:5672) - name: JOBAGENT_QUEUE_ACTIVEMQ_USERNAME value: amq-user - name: JOBAGENT_QUEUE_ACTIVEMQ_PASSWORD value: amq-pass - name: JOBAGENT_AUTH_SECRET value: c29tZS1hdXRoLXNlY3JldAo= - name: RAPIDMINER_JOBAGENT_OPTS value: "-Djobagent.python.registryBaseUrl=http://pem-webui-svc:82/"
RapidMiner Proxy & Python Environment Manager
The proxy component handles the incoming HTTP(S) traffic into the entire platform. Python Environment manager component (PEM) controls the python packages for job-agents. Real-Time Scoring (RTS) was designed for fast scoring use cases via web services. Those three platform pieces are MUST in one POD in kubernetes beaucuse proxy must read the certificates which are genereated by pem-cron and rts-cron containers.
apiVersion: apps/v1 kind: Deployment metadata: name: rm-proxy-svc labels: app: rm-proxy-svc role: proxy spec: replicas: 1 selector: matchLabels: app: rm-proxy-svc template: metadata: labels: app: rm-proxy-svc role: proxy spec: containers: - name: rm-proxy-svc image: rapidminer/rapidminer-proxy:9.6.0 imagePullPolicy: Always env: - name: RMSERVER_BACKEND value: "http://rm-server-svc:8080" - name: GRAFANA_BACKEND value: "http://rm-grafana-svc:3000" - name: GRAFANA_URL_SUFFIX value: "/grafana" - name: PEM_BACKEND value: "http://pem-webui-svc:82/" - name: PEM_URL_SUFFIX value: "/pem" - name: RTS_WEBUI_BACKEND value: "http://rts-webui-svc:81/" - name: RTS_WEBUI_URL_SUFFIX value: "/rts-admin" - name: RTS_SCORING_BACKEND value: "http://rts-agent-svc:8090/" - name: RTS_SCORING_URL_SUFFIX value: "/rts" - name: HTTPS_CRT_PATH value: "/rapidminer/uploaded/certs/validated_cert.crt" - name: HTTPS_KEY_PATH value: "/rapidminer/uploaded/certs/validated_cert.key" - name: HTTPS_DH_PATH value: "/rapidminer/uploaded/certs/dhparam.pem" - name: DEBUG_CONF_INIT value: "true" ports: - name: proxyhttp containerPort: 80 - name: proxyhttps containerPort: 443 volumeMounts: - name: pem-uploaded-pvc mountPath: /rapidminer/pem/uploaded/ - name: rts-uploaded-pvc mountPath: /rapidminer/rts/uploaded/ - name: pem-webui image: rapidminer/python-environment-manager-webui:9.6.0 imagePullPolicy: Always ports: - name: pem-webuiport containerPort: 82 volumeMounts: - name: pem-uploaded-pvc mountPath: /var/www/html/uploaded - name: pem-cron image: rapidminer/python-environment-manager-cron:9.6.0 imagePullPolicy: Always volumeMounts: - name: pem-uploaded-pvc mountPath: /rapidminer/uploaded - name: rts-cron image: rapidminer/rapidminer-real-time-scoring-cron:9.6.0 resources: requests: memory: "100M" cpu: "0.5" limits: memory: "200M" cpu: "0.5" volumeMounts: - name: rts-uploaded-pvc mountPath: /rapidminer/uploaded/ - name: rts-licenses-pvc mountPath: /rapidminer/rts_home/licenses/ - name: real-time-scoring-webui image: rapidminer/rapidminer-real-time-scoring-webui:9.6.0 ports: - name: rts-webuiport containerPort: 81 resources: requests: memory: "200M" cpu: "0.5" limits: memory: "500M" cpu: "0.5" volumeMounts: - name: rts-uploaded-pvc mountPath: /var/www/html/uploaded - name: rts-licenses-pvc mountPath: volumes: - name: pem-uploaded-pvc persistentVolumeClaim: claimName: pem-uploaded-pvc - name: rts-uploaded-pvc persistentVolumeClaim: claimName: rts-uploaded-pvc - name: rts-licenses-pvc persistentVolumeClaim: claimName: rts-licenses-pvc
Radoop Proxy
Radoop Proxy lets you tunnel all Radoop connections through a single machine residing on the edge of your secure Hadoop cluster. The traffic scheme is http because the proxy will reach RapidMiner Server on internal network.
kind: Deployment apiVersion: apps/v1 kind: Deployment metadata: name: radoop-proxy-svc labels: app: radoop-proxy-svc role: proxy spec: replicas: 1 selector: matchLabels: app: radoop-proxy-svc template: metadata: labels: app: radoop-proxy-svc role: proxy spec: containers: - name: radoop-proxy-svc image: rapidminer/radoop-proxy:1.2.1 ports: - name: radoop-proxy-port containerPort: 1081 env: - name: SERVERHOST value: rapidminer-server-svc - name: SERVERPORT value: '8080' - name: SCHEME value: http - name: AUTHENTICATION value: server volumeMounts: - name: radoop-proxy-pvc mountPath: /keystore volumes: - name: radoop-proxy-pvc persistentVolumeClaim: claimName: radoop-proxy-pvc