Python Extension loader (PEL)
Introduction
The Python extension loader(PEL) is a Java extension that manages the Python extensions built using the altair-aitools-devkit pacakge. The PEL seamlessly integrates the Python extensions into the Altair AI Tools ecosystem.
What PEL does
- Loads operator definitions from Python extensions (names, parameters, ports, etc.)
- Manages the execution of Python code using the appropriate Python distribution
- Handles data serialization between Java and Python
- Makes Python extensions appear and behave just like native AI Studio operators in your workflows
Where to place Python Extensions
By default, PEL looks for Python extension .zip file in one of these locations:
- ~/.RapidMiner/extensions/python-extensions
- ~/.AltairRapidMiner/AI Studio/shared/extensions/python-extensions
To specify a different location, use one of the these methods:
- In Altair AI Studio settings: Set
pysdk.pel.extLoc
to your extensions directory - System property: Set
pysdk.pel.extLoc
to your extensions directory - Environment variable: Set
PYSDK_PEL_EXT_LOC
to your extensions directory.
When properly configured, you'll see this log message
INFO: Looking for Python extensions in /path/to/py-extensions
In case of an error, you will see this in your log:
Failed to load Python extensions
Python Distribution Management
A Python distribution consists of a key string, and an optional version string.
If no version is provided, latest
is used. See below for ways how Python distributions can be made available to Python extensions:
The distribution mode and the installation process are chosen based on what is available, in the order below.
1. Environment creation via Miniforge (online)
Requirements:
- Miniforge has not been disabled via the setting below
- A conda environment definition file env.yml is present in the root directory of the Python extension
Description:
This is the default mode. This will create the environment based on the .yml file via Miniforge on demand (if it has not already been created by an earlier start / another Python extension) via the conda-forge channel.
The Miniforge installer will be downloaded and verified automatically when needed (but can also be provided via the resource location). Currently supported Miniforge installers:
- Miniforge3-24.3.0-0-Windows-x86_64.exe
- Miniforge3-24.3.0-0-MacOSX-arm64.sh
- Miniforge3-24.3.0-0-MacOSX-x86_64.sh
- Miniforge3-24.3.0-0-Linux-aarch64.sh
- Miniforge3-24.3.0-0-Linux-ppc64le.sh
- Miniforge3-24.3.0-0-Linux-x86_64.sh
To disable the use of Miniforge, you have 3 options, the first one to be specified is used:
- Use the
Settings
with the keypysdk.pel.disableMiniforge
set to true - Use a system property with the key
pysdk.pel.disableMiniforge
set to true - Use an environment variable with the key
PYSDK_PEL_DISABLE_MINIFORGE
set to true
Look for the following log line to indicate you set this up correctly:
INFO: Disabled Python distribution Miniforge usage
Sometimes the default Miniforge installation location is not desirable (e.g. due to Conda bugs with whitespaces or special characters, e.g. https://github.com/conda/conda/issues/10239). In those cases, you can change the location where Miniforge is installed to / is looked up in.
Note: The Windows installer is notoriously finicky, and does not like long-ish paths (~60 or more chars) or special characters or whitespaces at all. The default path is chosen with care, so modify it only if it does not work!
To change the installation directory of Miniforge, you have 3 options, the first one to be specified is used:
- Use the
Settings
with the keypysdk.pel.miniforgeDir
set to the directory where Miniforge should be installed to / looked up in - Use a system property with the key
pysdk.pel.miniforgeDir
set to the directory where Miniforge should be installed to / looked up in - Use an environment variable with the key
PYSDK_PEL_MINIFORGE_DIR
set to the directory where Miniforge should be installed to / looked up in
Look for the following log line to indicate you set this up correctly:
INFO: Set Miniforge directory to /path/to/miniforge
2. Environment creation via Miniforge (offline)
Requirements:
- Miniforge has not been disabled via the setting above
- A conda environment definition file env.yml is present in the root directory of the Python extension
- Air-gapped mode is enforced via the setting below
- Offline Miniforge distribution installer is present in the lookup directory
Description:
This will create the environment via Miniforge as described above, but use an offline disk-based channel for doing so. This is needed for air-gapped systems which have no access to the internet.
To set air-gapped mode, you have 3 options, the first one to be specified is used:
- Use the
Settings
with the keypysdk.pel.airGapped
set to true - Use a system property with the key
pysdk.pel.airGapped
set to true - Use an environment variable with the key
PYSDK_PEL_AIR_GAPPED
set to true
Look for the following log line to indicate you set this up correctly:
INFO: Set Python distribution to air-gapped mode
The lookup directory for the archive defaults to
- ~/.RapidMiner/internal cache/temp or
- ~/.AltairRapidMiner/AI Studio/{version}/internal cache/temp,
but can be changed as follows. There are three options, the first one to be specified is used:
- Use the
Settings
with the keypysdk.pel.resourceLoc
set to the directory where the Python distribution archives are located within - Use a system property with the key
pysdk.pel.resourceLoc
set to the directory where the Python distribution archives are located within - Use an environment variable with the key
PYSDK_PEL_RESOURCE_LOC
set to the directory where the Python distribution archives are located within
Look for the following log line to indicate you set this up correctly:
INFO: Set Python resource lookup directory to /path/to/py-resource-loc
3. Extractable Distributions (via archive file)
Requirements:
- Miniforge has been disabled via the setting above
- OR No conda environment definition file env.yml is present in the root directory of the Python extension
- OR No Offline Miniforge distribution installer is present in the lookup directory
- AND The distribution archive is present in the lookup directory (see details below)
Description:
This will look for an archive[tar|tar.gz|zip] file for the requested Python distribution.
The archive format changes depending on whether a version is provided or not.
Format is pydist-%s-%s-%s.%s
, where the 4 elements are constructed as follows:
- The key of the distribution
- If a version is provided:
version
, and nothing if no version is provided - Either
windows
ormacos
orlinux
- A file suffix, e.g.
zip
ortar.gz
Examples for a full name might be pydist-custom-1.0.0-windows.zip
or pydist-other-0.1.0-macos.tar.gz
This will install the contents of the archive on disk and verify its contents on each Studio start.
The extraction folder used for the contents of the archive is the lookup directory for the distributions and can also be changed via settings (see below).
4. Local Distributions (already pre-existing on disk)
Requirements:
- Miniforge has been disabled via the setting above
- OR No conda environment definition file env.yml is present in the root directory of the Python extension
- OR No Offline Miniforge distribution installer is present in the lookup directory
- AND No matching distribution archive is present in the lookup directory
Description:
This is the both the option usually used by headless execution agents like the job agent, scoring agent, etc. and as the last resort if none of the previous options worked. It expects that the required Python distribution is already located on disk and ready to use as-is.
The lookup directory for the distributions defaults to
- ~/.RapidMiner/internal cache/py-dists or
- ~/.AltairRapidMiner/AI Studio/{version}/internal cache/py-dists,
but can be changed as follows. There are three options, the first one to be specified is used:
- Use the
Settings
with the keypysdk.pel.distLoc
set to the parent directory where the Python distributions are located within - Use a system property with the key
pysdk.pel.distLoc
set to the parent directory where the Python distributions are located within - Use an environment variable with the key
PYSDK_PEL_DIST_LOC
set to the parent directory where the Python distributions are located within
Look for the following log line to indicate you set this up correctly:
INFO: Set Python distribution lookup directory to /path/to/py-dists
The full path for a distribution is then constructed in the following way:
Let's assume the folder specified above via the settings is /path/py-dists
to keep the example simple. The name of each folder within that path is simply in the form of key-version
of a distribution. In case of a non-versioned distribution, it takes the form of just key
.
Examples for full paths might therefore look like:
/path/py-dists/other-1.0.0
/path/py-dists/custom
Within each of those, the fully usable Python distribution for the correct OS is expected to be ready for use.
External Configuration Overview
PEL behavior can be customized, to allow it to work in many different scenarios and environments, from a regular, internet-facing laptop using AI Studio, all the way to an air-gapped system running a headless RTSA.
The table below contains all the currently available settings. Each setting can be set in 3 different ways, and the first one to be specified is used:
- via Java with the Settings
mechanism (note: using context Settings.CONTEXT_ALTAIR_LIB
)
- via system properties
- via environment variables
Description | Settings Key | System Property | Environment Variable |
---|---|---|---|
Determines that no distribution mechanism requiring an online connection is used. If set, Python distributions must be provided differently:
|
pysdk.pel.airGapped | pysdk.pel.airGapped | PYSDK_PEL_AIR_GAPPED |
The directory where Miniforge is located / should be installed in. With this setting you can use an existing Miniforge installation. Not used if Miniforge is disabled. |
pysdk.pel.miniforgeDir | pysdk.pel.miniforgeDir | PYSDK_PEL_MINIFORGE_DIR |
Disables the use of Miniforge altogether. Python distributions must either be provided via archive or ready to use on disk. |
pysdk.pel.disableMiniforge | pysdk.pel.disableMiniforge | PYSDK_PEL_DISABLE_MINIFORGE |
Disables detailed validation of Python distributions via Miniforge. This is meant for use-cases where a central validation is expected to have taken place, like an AI Hub environment which shares the conda environments via a read-only mount. A minimal validation (folder with correct name exists in Miniforge/env subfolder) is still taking place. |
pysdk.pel.miniforgeDisableEnvValidation | pysdk.pel.miniforgeDisableEnvValidation | PYSDK_PEL_MINIFORGE_DISABLE_ENV_VALIDATION |
Determines that Python distributions should not be registered. This is used by AI Hub to register the Python extensions (and their operators), but since AI Hub itself does not execute processes, it does not need any Python distributions installed. |
pysdk.pel.skipPythonDistRegistration | pysdk.pel.skipPythonDistRegistration | PYSDK_PEL_SKIP_DIST_REGISTRATION |
Disables the default async loading of Python extensions & distributions during startup and instead forces the initial loading to happen synchronously during Plugin init. This is e.g. used by AI Hub web API agents, which need this to know how long to wait before declaring themselves ready to run processes. |
pysdk.pel.forceSynchronousLoading | pysdk.pel.forceSynchronousLoading | PYSDK_PEL_FORCE_SYNCHRONOUS_LOADING |
The directory is used as the parent directory for all Python distribution lookups for the referenced distributions. Subfolders are treated as Python distributions and expected to be named as {key}-{version} . |
pysdk.pel.distLoc | pysdk.pel.distLoc | PYSDK_PEL_DIST_LOC |
The directory where any required resources are looked up.
|
pysdk.pel.resourceLoc | pysdk.pel.resourceLoc | PYSDK_PEL_RESOURCE_LOC |
Determines the lookup location where Python extensions are searched in | pysdk.pel.extLoc | pysdk.pel.extLoc | PYSDK_PEL_EXT_LOC |
The debug mode to use. Will trigger additional logging. One of:
|
pysdk.debugMode | pysdk.debugMode | PYSDK_DEBUG_MODE |
The lower bound of the port range to use for the python servers. | pysdk.pel.server.minPort | pysdk.pel.server.minPort | PYSDK_PEL_SERVER_MIN_PORT |
The upper bound of the port range to use for the python servers. | pysdk.pel.server.maxPort | pysdk.pel.server.maxPort | PYSDK_PEL_SERVER_MAX_PORT |
Time in seconds the server is kept alive in idle state before shutting down. | pysdk.pel.server.idleShutdown | pysdk.pel.server.idleShutdown | PYSDK_PEL_SERVER_IDLE_SHUTDOWN |
In the next section, we will dig deeper with custom data objects