Python Extension loader (PEL)

Introduction

The Python extension loader(PEL) is a Java extension that manages the Python extensions built using the altair-aitools-devkit pacakge. The PEL seamlessly integrates the Python extensions into the Altair AI Tools ecosystem.

What PEL does

  • Loads operator definitions from Python extensions (names, parameters, ports, etc.)
  • Manages the execution of Python code using the appropriate Python distribution
  • Handles data serialization between Java and Python
  • Makes Python extensions appear and behave just like native AI Studio operators in your workflows

Where to place Python Extensions

By default, PEL looks for Python extension .zip file in one of these locations:

  • ~/.RapidMiner/extensions/python-extensions
  • ~/.AltairRapidMiner/AI Studio/shared/extensions/python-extensions

To specify a different location, use one of the these methods:

  1. In Altair AI Studio settings: Set pysdk.pel.extLoc to your extensions directory
  2. System property: Set pysdk.pel.extLoc to your extensions directory
  3. Environment variable: Set PYSDK_PEL_EXT_LOC to your extensions directory.

When properly configured, you'll see this log message

INFO: Looking for Python extensions in /path/to/py-extensions

In case of an error, you will see this in your log:

Failed to load Python extensions

Python Distribution Management

A Python distribution consists of a key string, and an optional version string. If no version is provided, latest is used. See below for ways how Python distributions can be made available to Python extensions:

The distribution mode and the installation process are chosen based on what is available, in the order below.

Flowchart of distribution lookup process

1. Environment creation via Miniforge (online)

Requirements:

  • Miniforge has not been disabled via the setting below
  • A conda environment definition file env.yml is present in the root directory of the Python extension

Description:

This is the default mode. This will create the environment based on the .yml file via Miniforge on demand (if it has not already been created by an earlier start / another Python extension) via the conda-forge channel.

The Miniforge installer will be downloaded and verified automatically when needed (but can also be provided via the resource location). Currently supported Miniforge installers:

  • Miniforge3-24.3.0-0-Windows-x86_64.exe
  • Miniforge3-24.3.0-0-MacOSX-arm64.sh
  • Miniforge3-24.3.0-0-MacOSX-x86_64.sh
  • Miniforge3-24.3.0-0-Linux-aarch64.sh
  • Miniforge3-24.3.0-0-Linux-ppc64le.sh
  • Miniforge3-24.3.0-0-Linux-x86_64.sh

To disable the use of Miniforge, you have 3 options, the first one to be specified is used:

  1. Use the Settings with the key pysdk.pel.disableMiniforge set to true
  2. Use a system property with the key pysdk.pel.disableMiniforge set to true
  3. Use an environment variable with the key PYSDK_PEL_DISABLE_MINIFORGE set to true

Look for the following log line to indicate you set this up correctly:

INFO: Disabled Python distribution Miniforge usage

Sometimes the default Miniforge installation location is not desirable (e.g. due to Conda bugs with whitespaces or special characters, e.g. https://github.com/conda/conda/issues/10239). In those cases, you can change the location where Miniforge is installed to / is looked up in.

Note: The Windows installer is notoriously finicky, and does not like long-ish paths (~60 or more chars) or special characters or whitespaces at all. The default path is chosen with care, so modify it only if it does not work!

To change the installation directory of Miniforge, you have 3 options, the first one to be specified is used:

  1. Use the Settings with the key pysdk.pel.miniforgeDir set to the directory where Miniforge should be installed to / looked up in
  2. Use a system property with the key pysdk.pel.miniforgeDir set to the directory where Miniforge should be installed to / looked up in
  3. Use an environment variable with the key PYSDK_PEL_MINIFORGE_DIR set to the directory where Miniforge should be installed to / looked up in

Look for the following log line to indicate you set this up correctly:

INFO: Set Miniforge directory to /path/to/miniforge

2. Environment creation via Miniforge (offline)

Requirements:

  • Miniforge has not been disabled via the setting above
  • A conda environment definition file env.yml is present in the root directory of the Python extension
  • Air-gapped mode is enforced via the setting below
  • Offline Miniforge distribution installer is present in the lookup directory

Description:

This will create the environment via Miniforge as described above, but use an offline disk-based channel for doing so. This is needed for air-gapped systems which have no access to the internet.

To set air-gapped mode, you have 3 options, the first one to be specified is used:

  1. Use the Settings with the key pysdk.pel.airGapped set to true
  2. Use a system property with the key pysdk.pel.airGapped set to true
  3. Use an environment variable with the key PYSDK_PEL_AIR_GAPPED set to true

Look for the following log line to indicate you set this up correctly:

INFO: Set Python distribution to air-gapped mode

The lookup directory for the archive defaults to

  • ~/.RapidMiner/internal cache/temp or
  • ~/.AltairRapidMiner/AI Studio/{version}/internal cache/temp,

but can be changed as follows. There are three options, the first one to be specified is used:

  1. Use the Settings with the key pysdk.pel.resourceLoc set to the directory where the Python distribution archives are located within
  2. Use a system property with the key pysdk.pel.resourceLoc set to the directory where the Python distribution archives are located within
  3. Use an environment variable with the key PYSDK_PEL_RESOURCE_LOC set to the directory where the Python distribution archives are located within

Look for the following log line to indicate you set this up correctly:

INFO: Set Python resource lookup directory to /path/to/py-resource-loc

3. Extractable Distributions (via archive file)

Requirements:

  • Miniforge has been disabled via the setting above
  • OR No conda environment definition file env.yml is present in the root directory of the Python extension
  • OR No Offline Miniforge distribution installer is present in the lookup directory
  • AND The distribution archive is present in the lookup directory (see details below)

Description:

This will look for an archive[tar|tar.gz|zip] file for the requested Python distribution. The archive format changes depending on whether a version is provided or not. Format is pydist-%s-%s-%s.%s, where the 4 elements are constructed as follows:

  • The key of the distribution
  • If a version is provided: version, and nothing if no version is provided
  • Either windows or macos or linux
  • A file suffix, e.g. zip or tar.gz

Examples for a full name might be pydist-custom-1.0.0-windows.zip or pydist-other-0.1.0-macos.tar.gz

This will install the contents of the archive on disk and verify its contents on each Studio start.

The extraction folder used for the contents of the archive is the lookup directory for the distributions and can also be changed via settings (see below).

4. Local Distributions (already pre-existing on disk)

Requirements:

  • Miniforge has been disabled via the setting above
  • OR No conda environment definition file env.yml is present in the root directory of the Python extension
  • OR No Offline Miniforge distribution installer is present in the lookup directory
  • AND No matching distribution archive is present in the lookup directory

Description:

This is the both the option usually used by headless execution agents like the job agent, scoring agent, etc. and as the last resort if none of the previous options worked. It expects that the required Python distribution is already located on disk and ready to use as-is.

The lookup directory for the distributions defaults to

  • ~/.RapidMiner/internal cache/py-dists or
  • ~/.AltairRapidMiner/AI Studio/{version}/internal cache/py-dists,

but can be changed as follows. There are three options, the first one to be specified is used:

  1. Use the Settings with the key pysdk.pel.distLoc set to the parent directory where the Python distributions are located within
  2. Use a system property with the key pysdk.pel.distLoc set to the parent directory where the Python distributions are located within
  3. Use an environment variable with the key PYSDK_PEL_DIST_LOC set to the parent directory where the Python distributions are located within

Look for the following log line to indicate you set this up correctly:

INFO: Set Python distribution lookup directory to /path/to/py-dists

The full path for a distribution is then constructed in the following way:

Let's assume the folder specified above via the settings is /path/py-dists to keep the example simple. The name of each folder within that path is simply in the form of key-version of a distribution. In case of a non-versioned distribution, it takes the form of just key.

Examples for full paths might therefore look like:

/path/py-dists/other-1.0.0
/path/py-dists/custom

Within each of those, the fully usable Python distribution for the correct OS is expected to be ready for use.

External Configuration Overview

PEL behavior can be customized, to allow it to work in many different scenarios and environments, from a regular, internet-facing laptop using AI Studio, all the way to an air-gapped system running a headless RTSA.

The table below contains all the currently available settings. Each setting can be set in 3 different ways, and the first one to be specified is used: - via Java with the Settings mechanism (note: using context Settings.CONTEXT_ALTAIR_LIB) - via system properties - via environment variables

Description Settings Key System Property Environment Variable
Determines that no distribution mechanism requiring an online connection is used.
If set, Python distributions must be provided differently:
  • via disk-based conda channel for Miniforge
  • via distribution archive
  • ready to use on disk
pysdk.pel.airGapped pysdk.pel.airGapped PYSDK_PEL_AIR_GAPPED
The directory where Miniforge is located / should be installed in. With this setting you can use an existing Miniforge installation.
Not used if Miniforge is disabled.
pysdk.pel.miniforgeDir pysdk.pel.miniforgeDir PYSDK_PEL_MINIFORGE_DIR
Disables the use of Miniforge altogether.
Python distributions must either be provided via archive or ready to use on disk.
pysdk.pel.disableMiniforge pysdk.pel.disableMiniforge PYSDK_PEL_DISABLE_MINIFORGE
Disables detailed validation of Python distributions via Miniforge.
This is meant for use-cases where a central validation is expected to have taken place, like an AI Hub environment which shares the conda environments via a read-only mount.
A minimal validation (folder with correct name exists in Miniforge/env subfolder) is still taking place.
pysdk.pel.miniforgeDisableEnvValidation pysdk.pel.miniforgeDisableEnvValidation PYSDK_PEL_MINIFORGE_DISABLE_ENV_VALIDATION
Determines that Python distributions should not be registered.
This is used by AI Hub to register the Python extensions (and their operators), but since AI Hub itself does not execute processes, it does not need any Python distributions installed.
pysdk.pel.skipPythonDistRegistration pysdk.pel.skipPythonDistRegistration PYSDK_PEL_SKIP_DIST_REGISTRATION
Disables the default async loading of Python extensions & distributions during startup and instead forces the initial loading to happen synchronously during Plugin init.
This is e.g. used by AI Hub web API agents, which need this to know how long to wait before declaring themselves ready to run processes.
pysdk.pel.forceSynchronousLoading pysdk.pel.forceSynchronousLoading PYSDK_PEL_FORCE_SYNCHRONOUS_LOADING
The directory is used as the parent directory for all Python distribution lookups for the referenced distributions. Subfolders are treated as Python distributions and expected to be named as {key}-{version}. pysdk.pel.distLoc pysdk.pel.distLoc PYSDK_PEL_DIST_LOC
The directory where any required resources are looked up.
  • the provided directory is used as the lookup directory for the Miniforge installer in case it is needed
  • the provided directory is used as the lookup directory for the Miniforge offline disk-based channel archive to create environments on air-gapped systems
  • the provided directory is used as the lookup directory for all Python distribution archives for the referenced distributions in case archive mode is needed
pysdk.pel.resourceLoc pysdk.pel.resourceLoc PYSDK_PEL_RESOURCE_LOC
Determines the lookup location where Python extensions are searched in pysdk.pel.extLoc pysdk.pel.extLoc PYSDK_PEL_EXT_LOC
The debug mode to use. Will trigger additional logging. One of:
  • NONE - No additional logging
  • OPERATOR - operators log additional information at INFO level (as opposed to FINE)
  • ALL - Python script logs at INFO level (as opposed to FINE); also logs additional information
pysdk.debugMode pysdk.debugMode PYSDK_DEBUG_MODE
The lower bound of the port range to use for the python servers. pysdk.pel.server.minPort pysdk.pel.server.minPort PYSDK_PEL_SERVER_MIN_PORT
The upper bound of the port range to use for the python servers. pysdk.pel.server.maxPort pysdk.pel.server.maxPort PYSDK_PEL_SERVER_MAX_PORT
Time in seconds the server is kept alive in idle state before shutting down. pysdk.pel.server.idleShutdown pysdk.pel.server.idleShutdown PYSDK_PEL_SERVER_IDLE_SHUTDOWN

In the next section, we will dig deeper with custom data objects