Python Extension Loader
An Altair AI Suite extension that can add Python-based AI Suite extensions to the platform. Part of the Altair AI Suite Python SDK.
The Altair AI Suite Python SDK consists of two parts:
- Python AI Tools Devkit (PYSDK): This is used to create a Python extension - a .zip file - out of your Python package, as well as create the environment definition to be used by PEL
- Python Extension Loader (PEL): Loads the created Python extensions into the AI Suite to allow workflows to use the defined operators.
The PEL loads the operator declarations (name, parameters, ports, etc.) defined in the respective Python extension. When such an operator should be run, it delegates to the required Python distribution for executing the function behind an operator and adds the Python extension .zip file as a library. This distribution is then called with a wrapper script and an extensive argument JSON describing how the library should be called.
The distribution is usually managed by Miniforge - meaning the user does not need to take any action himself, the required Python environment is installed and managed automatically for him. Inputs and outputs are either part of the wrapper argument JSON, or are serialized to and from Java.
In other words, the PEL registers new operators that can be added to workflows and are not distinguishable from Java-backed operators.
They can be used in local or AI Hub setups as long as the indicated Python distribution (via the environment
entry in the Python extension .toml file (and potentially the bundled conda env.yml file)) is available.
Table of contents
Python Extension Lookup
To change this location, you have 3 options, the first one to be specified is used:
- Use the Settings with the key
pysdk.pel.extLoc
set to the directory where all Python extensions are located - Use a system property with the key
pysdk.pel.extLoc
set to the directory where all Python extensions are located - Use an environment variable with the key
PYSDK_PEL_EXT_LOC
set to the directory where all Python extensions are located
Look for the following log line to indicate you set this up correctly:
INFO: Looking for Python extensions in /path/to/py-extensions
In case of an error, you will see this in your log:
Failed to load Python extensions
Python Distribution Lookup
There are four different approaches:
- Environment creation via Miniforge (online)
- Environment creation via Miniforge (offline)
- Extractable Distributions (via archive file)
- Local Distributions (already pre-existing on disk)
1. Environment creation via Miniforge (online)
Requirements:
- Miniforge has not been disabled via the setting below
- A conda environment definition file env.yml is present in the root directory of the Python extension
Description
This is the default mode. This will create the environment based on the .yml file via Miniforge on demand (if it has not already been created by an earlier start / another Python extension) via the conda-forge channel.
The Miniforge installer will be downloaded and verified automatically when needed (but can also be provided via the resource location). Currently supported Miniforge installers:
- Miniforge3-24.3.0-0-Windows-x86_64.exe
- Miniforge3-24.3.0-0-MacOSX-arm64.sh
- Miniforge3-24.3.0-0-MacOSX-x86_64.sh
- Miniforge3-24.3.0-0-Linux-aarch64.sh
- Miniforge3-24.3.0-0-Linux-ppc64le.sh
- Miniforge3-24.3.0-0-Linux-x86_64.sh
To disable the use of Miniforge, you have 3 options, the first one to be specified is used:
- Use the Settings with the key
pysdk.pel.disableMiniforge
set to true - Use a system property with the key
pysdk.pel.disableMiniforge
set to true - Use an environment variable with the key
PYSDK_PEL_DISABLE_MINIFORGE
set to true
Look for the following log line to indicate you set this up correctly:
INFO: Disabled Python distribution Miniforge usage
Sometimes the default Miniforge installation location is not desirable (e.g. due to Conda bugs with whitespaces or special characters, e.g. https://github.com/conda/conda/issues/10239). In those cases, you can change the location where Miniforge is installed to / is looked up in.
Note: The Windows installer is notoriously finicky, and does not like long-ish paths (~60 or more chars) or special characters or whitespaces at all. The default path is chosen with care, so modify it only if it does not work!
To change the installation directory of Miniforge, you have 3 options, the first one to be specified is used:
- Use the
Settings
with the keypysdk.pel.miniforgeDir
set to the directory where Miniforge should be installed to / looked up in - Use a system property with the key
pysdk.pel.miniforgeDir
set to the directory where Miniforge should be installed to / looked up in - Use an environment variable with the key
PYSDK_PEL_MINIFORGE_DIR
set to the directory where Miniforge should be installed to / looked up in
Look for the following log line to indicate you set this up correctly:
INFO: Set Miniforge directory to /path/to/miniforge
2. Environment creation via Miniforge (offline)
Requirements:
- Miniforge has not been disabled via the setting above
- A conda environment definition file env.yml is present in the root directory of the Python extension
- Air-gapped mode is enforced via the setting below
- Offline Miniforge distribution installer is present in the lookup directory
Description:
This will create the environment via Miniforge as described above, but use an offline disk-based channel for doing so. This is needed for air-gapped systems which have no access to the internet.
To set air-gapped mode, you have 3 options, the first one to be specified is used:
- Use the Settings with the key
pysdk.pel.airGapped
set to true - Use a system property with the key
pysdk.pel.airGapped
set to true - Use an environment variable with the key
PYSDK_PEL_AIR_GAPPED
set to true
Look for the following log line to indicate you set this up correctly:
INFO: Set Python distribution to air-gapped mode
The lookup directory for the archive defaults to
- ~/.RapidMiner/internal cache/temp, or
- ~/.AltairRapidMiner/AI Studio/{version}/internal cache/temp,
but can be changed via the setting below:
To set the location in which the archives are looked up in, you have 3 options, the first one to be specified is used:
- Use the Settings with the key
pysdk.pel.resourceLoc
set to the directory where the Python distribution archives are located - Use a system property with the key
pysdk.pel.resourceLoc
set to the directory where the Python distribution archives are located - Use an environment variable with the key
PYSDK_PEL_RESOURCE_LOC
set to the directory where the Python distribution archives are located
Look for the following log line to indicate you set this up correctly:
INFO: Set Python resource lookup directory to /path/to/py-resource-loc
3. Extractable Distributions (via archive file)
Requirements:
- Miniforge has been disabled via the setting above
- OR No conda environment definition file env.yml is present in the root directory of the Python extension
- OR No Offline Miniforge distribution installer is present in the lookup directory
- AND The distribution archive is present in the lookup directory (see details below)
Description:
This will look for an archive[tar|tar.gz|zip] file for the requested Python distribution.
The archive format changes depending on whether a version is provided or not.
Format is pydist-%s-%s-%s.%s
, where the 4 elements are constructed as follows:
- The key of the distribution
- If a version is provided: version, and nothing if no version is provided
- Either windows or macos or linux
- A file suffix, e.g. zip or tar.gz
Examples for a full name might be
- pydist-custom-1.0.0-windows.zip or
- pydist-other-0.1.0-macos.tar.gz
This will install the contents of the archive on disk and verify its contents on each Studio start.
The extraction folder used for the contents of the archive is the lookup directory for the distributions and can also be changed via settings (see below).
4. Local Distributions (already pre-existing on disk)
Requirements:
- Miniforge has been disabled via the setting above
- OR No conda environment definition file env.yml is present in the root directory of the Python extension
- OR No Offline Miniforge distribution installer is present in the lookup directory
- AND No matching distribution archive is present in the lookup directory
Description:
This is the both the option usually used by headless execution agents like the job agent, scoring agent, etc. and as the last resort if none of the previous options worked. It expects that the required Python distribution is already located on disk and ready to use as-is.
The lookup directory for the distributions defaults to
- ~/.RapidMiner/internal cache/py-dists or
- ~/.AltairRapidMiner/AI Studio/{version}/internal cache/py-dists,
but can be changed via the setting below:
To set the location in which the distributions are looked up in, you have 3 options, the first one to be specified is used:
- Use the Settings with the key
pysdk.pel.distLoc
set to the parent directory where the Python distributions are located - Use a system property with the key
pysdk.pel.distLoc
set to the parent directory where the Python distributions are located - Use an environment variable with the key
PYSDK_PEL_DIST_LOC
set to the parent directory where the Python distributions are located
Look for the following log line to indicate you set this up correctly:
INFO: Set Python distribution lookup directory to /path/to/py-dists
The full path for a distribution is then constructed in the following way:
Let's assume the folder specified above via the settings is /path/py-dists to keep the example simple. The name of each folder within that path is simply in the form of key-version
of a distribution. In case of a non-versioned distribution, it takes the form of just key
.
Examples for full paths might therefore look like:
- /path/py-dists/other-1.0.0
- /path/py-dists/custom
Within each of those, the fully usable Python distribution for the correct OS is expected to be ready for use.
External Configuration Overview
PEL behavior can be customized, to allow it to work in many different scenarios and environments, from a regular, internet-facing laptop using AI Studio, all the way to an air-gapped system running a headless RTSA.
The table below contains all the currently available settings. Each setting can be set in 3 different ways, and the first one to be specified is used:
- via Java with the
Settings
mechanism (note: using contextSettings.CONTEXT_ALTAIR_LIB
) - via system properties
- via environment variables
Description | Settings Key | System Property | Environment Variable |
---|---|---|---|
Determines that no distribution mechanism requiring an online connection is used. If set, Python distributions must be provided differently:
|
pysdk.pel.airGapped | pysdk.pel.airGapped | PYSDK_PEL_AIR_GAPPED |
Disables the use of Miniforge altogether. Python distributions must either be provided via archive or ready to use on disk. |
pysdk.pel.disableMiniforge | pysdk.pel.disableMiniforge | PYSDK_PEL_DISABLE_MINIFORGE |
The directory where Miniforge is located / should be installed in. With this setting you can use an existing Miniforge installation. Not used if Miniforge is disabled. |
pysdk.pel.miniforgeDir | pysdk.pel.miniforgeDir | PYSDK_PEL_MINIFORGE_DIR |
The directory is used as the parent directory for all Python distribution lookups for the referenced distributions. Subfolders are treated as Python distributions and expected to be named as {key}-{version} . |
pysdk.pel.distLoc | pysdk.pel.distLoc | PYSDK_PEL_DIST_LOC |
The directory where any required resources are looked up.
|
pysdk.pel.resourceLoc | pysdk.pel.resourceLoc | PYSDK_PEL_RESOURCE_LOC |
Determines the lookup location where Python extensions are searched in | pysdk.pel.extLoc | pysdk.pel.extLoc | PYSDK_PEL_EXT_LOC |
The debug mode to use. Will trigger additional logging. One of:
|
pysdk.debugMode | pysdk.debugMode | PYSDK_DEBUG_MODE |
The lower bound of the port range to use for the python servers. | pysdk.pel.server.minPort | pysdk.pel.server.minPort | PYSDK_PEL_SERVER_MIN_PORT |
The upper bound of the port range to use for the python servers. | pysdk.pel.server.maxPort | pysdk.pel.server.maxPort | PYSDK_PEL_SERVER_MAX_PORT |
Time in seconds the server is kept alive in idle state before shutting down. | pysdk.pel.server.idleShutdown | pysdk.pel.server.idleShutdown | PYSDK_PEL_SERVER_IDLE_SHUTDOWN |
extension.json
The extension.json file is created by the Altair AI Tools Devkit and added to the created Python extension .zip file. It contains all the information required by the PEL to create a "real" AI Suite extension out of it. This encompasses a description of each operator, as well as information on the required Python environment.
JSON Format Example
Here is an example of such a JSON file. Below it, you will find an accurate description of each supported element, as well as potential additional options.
{
"extension": {
"name": "Python Samples",
"namespace": "pysa",
"version": "0.1.0",
"license": "Apache License, Version 2.0,
"environment": "bundled:0.1.3",
"module": "samples",
"dependencies": [
{
"name": "pytensors",
"min_version": "0.3.0",
"module": "tensors"
}
],
"sdk_version": "0.1.0"
},
"operators": {
"hello_world": {
"name": "Hello World",
"implementation": "hello_world",
"parameters": [
{
"type": "string",
"name": "name",
"description": "the name to greet (optional)",
"categories": null,
"default": null,
"optional": true
},
{
"type": "integer",
"name": "times",
"description": "the number of greetings to generate",
"categories": null,
"default": 3,
"conditions": [
[
"!=",
"name",
"stranger"
]
],
"optional": false
}
],
"inputs": [],
"outputs": [
{
"name": "result1",
"type": "table"
}
],
"icon": "message.png"
},
"scale": {
"name": "Normalize",
"implementation": "scale",
"parameters": [
{
"type": "category",
"name": "method",
"description": "the normalization method",
"categories": [
"MAX_ABSOLUTE",
"MIN_MAX",
"STANDARD"
],
"default": "MIN_MAX",
"optional": false
},
{
"type": "string",
"name": "selected_col",
"description": "Selected column of the input dataframe",
"categories": null,
"default": null,
"optional": false,
"annotations": [
["SelectedColumnAnnotation", "data"]
]
}
],
"inputs": [
{
"name": "data",
"type": "table",
"optional": false
}
],
"outputs": [
{
"name": "normalized",
"type": "table"
}
]
}
}
}
Extension Block
The extension
block contains some meta information required by the RapidMiner extension format, as well as some information on the Python script.
name
: The name of the extension. Will for example appear as the name of the sub-folder in the Extensions section of the available RapidMiner operatorsnamespace
: The namespace of the extension. Will be used in the process XML as operator key prefix, just like any other RapidMiner extension. Will later be required to be unique, but is currently not validated for uniqueness among the installed Python extensionslicense
: The name of the license under which the Python extension is released. Must match the actual contents of the LICENSE file in the root level. Note: If you have 3rd-party package dependencies, add their licenses under licenses/package_name.license_name.license files.version
: The version of the Python extension. Will later be used to determine which Python extension to load if multiple versions of a single one are present, but is currently not being usedenvironment
: The key and optional version of the Python distribution required to run the contained Python code. Format is eitherkey:version
orkey
. Environments can either be created on demand or are expected to be located on disk already. Which mode is used depends on the (optional) settings set during PEL startup. By default, the envs are build using Miniforge. The available options are:- Miniforge builds the environment on-demand from the conda-forge channel
- Miniforge builds the environment on-demand from an offline disk-based channel (for air-gapped systems)
- Distribution archives containing the entire distribution can be extracted
- The distribution is already on disk and ready to be used
module
: The name of the Python module in which the referenced script functions can be imported fromdependencies
: List of Python extension dependencies which the extension may use functionality from.name
: Namespace of the Python extension (dependency)min_version
: Required minimum version of the Python extension (dependency)module
: Same as the previously describedmodule
just for the dependency extension. Required by PEW
sdk_version
: Version number of the Python SDK that was used to build the Python extension.
Operators Block
The operators
block contains a dictionary of all the operators contained in this Python extension. Each of these is referenced by its key.
These keys must be unique to be used as a RapidMiner operator key, which is the reason for having a dictionary here.
Each operator contains the following:
name
: The name of the operator as it should appear to the userimplementation
: The name of the Python function that contains the code backing this operator. Note that the required inputs of this function are split into theparameters
andinputs
elements, depending on their type.icon
: Optional. The name of one of the icons provided by AI Studio. If not set, a default icon will be used.tags
: Optional. Contains an array of tags for the operator. Tags are used when searching for operators. By default, no tags are present.synopsis
: Optional. A short 1-2 sentence synopsis what this operator is doing.description
: Optional. A longer description text describing in more details what this operator is doing.parameters
: An array containing the parameters of the operator, which are used for getting settings into the script function as arguments. Each element within must consist of the following:name
: The name of the parameter, will be used as the key internally.description
: The description of the parameter, to inform the user what this parameter is for.type
: The type of the parameter. Depending on the type, some of the following elements can become optional. Currently supported types are:integer
: An int value will be provided as Python function input.real
: A float value will be provided as Python function input.boolean
: A bool value will be provided as Python function input.string
: A str value will be provided as Python function input.category
: An Enum constant will be provided as Python function input. Must also provide thecategories
with Enum values available for selection.
categories
: Optional. Only used if the Python function requires an Enum. Lists the selectable values in an array.default
: Optional. A pre-selected default value for the parameter. Must conform to the type of the parameter.optional
: iftrue
, the parameter can be ignored and left empty. Iffalse
, providing a proper value to the parameter will be enforced.annotations
: Optional. Array of parameter annotations. Each annotation itself is an array with variable number of elements. The first element is the annotation type, the remaining are the parameters. List of supported annotation types:SelectedColumnAnnotation
: Represents a selected column of an input dataframe. Parameters:- Name of the input port it refers to (must be a dataframe).
TextParameterAnnotation
: Represents longer texts. Parameters:- Text type (for syntax highlighting). Supported values:
PLAIN
(default),JSON
,XML
,HTML
,SQL
andPYTHON
.
- Text type (for syntax highlighting). Supported values:
ConditionalAnnotation
: Represents operator parameter conditions (supported types: boolean, string and category). Parameters:- Operator string. Must be
==
or!=
for (un-)equal string conditions. - Name of the operator parameter the condition references. The type of the referenced parameter determines the type of condition.
- Value which the referenced parameter will be compared to. Must be either
true
orfalse
for boolean conditions. Can benull
, to check if a parameter is set.
- Operator string. Must be
inputs
: An array containing the input ports of the operator, which are used for getting data into the script function as arguments. Each element within must consist of the following:name
: The name of the Python script function argument this input is fortype
: The input type, currently supported values are:table
for data tables which get converted to a Pandas DataFramefile
for binary files which get converted to a Path object
optional
: iftrue
, the presence of actual input data at this port will not be enforced. Iffalse
, the absence of data will be flagged as an error.
outputs
: An array containing the output ports of the operator, which are used for getting data from the script result back to the process. Each element within must consist of the following:name
: The name of the Python script function return value of this output. Used for identification if multiple output values are present.type
: The output type, currently supported values are:table
for Pandas DataFrame which will get converted to a RapidMiner data tablefile
for Path objects which will get converted to binary files