Extension Configuration JSON Format

Introduction

When developing extensions for Altair AI Tools using the Python SDK, your extension's configuration is bundled as a JSON file. This file defines metadata, operators, parameters, and data objects that link your Python code to the Altair AI Tools operators.

Important: This JSON file is automatically generated by the Python SDK and should be treated as read-only. Manual modifications may cause the extension to malfunction or fail to load properly.

JSON Format Example

The Python extension configuration is stored as a JSON file. Below is an example of a JSON configuration for a sample extension:

{
    "extension": {
        "name": "Python Samples",
        "namespace": "pysa",
        "version": "0.1.0",
        "license": "Apache License, Version 2.0",
        "environment": "bundled:0.1.3",
        "module": "samples",
        "dependencies": [
            {
                "name": "pytensors",
                "min_version": "0.3.0",
                "module": "tensors"
            }
        ],
        "sdk_version": "0.1.0"
    },
    "operators": {
        "hello_world": {
            "name": "Hello World",
            "implementation": "hello_world",
            "parameters": [
                {
                    "type": "string",
                    "name": "name",
                    "description": "the name to greet (optional)",
                    "categories": null,
                    "default": null,
                    "optional": true
                },
                {
                    "type": "integer",
                    "name": "times",
                    "description": "the number of greetings to generate",
                    "categories": null,
                    "default": 3,
                    "conditions": [
                        [
                            "!=",
                            "name",
                            "stranger"
                        ]
                    ],
                    "optional": false
                }
            ],
            "inputs": [],
            "outputs": [
                {
                    "name": "result1",
                    "type": "table"
                }
            ],
            "icon": "message.png"
        },
        "scale": {
            "name": "Normalize",
            "implementation": "scale",
            "parameters": [
                {
                    "type": "category",
                    "name": "method",
                    "description": "the normalization method",
                    "categories": [
                        "MAX_ABSOLUTE",
                        "MIN_MAX",
                        "STANDARD"
                    ],
                    "default": "MIN_MAX",
                    "optional": false
                },
                {
                    "type": "string",
                    "name": "selected_col",
                    "description": "Selected column of the input dataframe",
                    "categories": null,
                    "default": null,
                    "optional": false,
                    "annotations": [
                        ["SelectedColumnAnnotation", "data"]
                    ]
                }
            ],
            "inputs": [
                {
                    "name": "data",
                    "type": "table",
                    "optional": false
                }
            ],
            "outputs": [
                {
                    "name": "normalized",
                    "type": "table"
                }
            ]
        }
    }
}

Configuration Structure

The JSON file consists of two main sections: the extension block containing the metadata and the operators block defining the extension's functionality.

Extension Block

The extension block contains some meta information required by the Altair AI Tools extension format, as well as some information on the Python script.

  • name: The name of the extension. Will for example appear as the name of the sub-folder in the Extensions section of the available RapidMiner operators
  • namespace: The namespace of the extension. Will be used in the process XML as operator key prefix, just like any other RapidMiner extension. Will later be required to be unique, but is currently not validated for uniqueness among the installed Python extensions
  • license: The name of the license under which the Python extension is released. Must match the actual contents of the LICENSE file in the root level. Note: If you have 3rd-party package dependencies, add their licenses under licenses/package_name.license_name.license files.
  • version: The version of the Python extension. Will later be used to determine which Python extension to load if multiple versions of a single one are present, but is currently not being used
  • environment: The key and optional version of the Python distribution required to run the contained Python code. Format is either key:version or key. Environments can either be created on demand or are expected to be located on disk already. Which mode is used depends on the (optional) settings set during PEL startup. By default, the envs are build using Miniforge.

The available options are:

  • Miniforge builds the environment on-demand from the conda-forge channel
  • Miniforge builds the environment on-demand from an offline disk-based channel (for air-gapped systems)
  • Distribution archives containing the entire distribution can be extracted
  • The distribution is already on disk and ready to be used

  • module: The name of the Python module in which the referenced script functions can be imported from

  • dependencies: List of Python extension dependencies which the extension may use functionality from.

    • name: Namespace of the Python extension (dependency)
    • min_version: Required minimum version of the Python extension (dependency)
    • module: Same as the previously described module just for the dependency extension. Required by PEW
  • sdk_version: Version number of the Python SDK that was used to build the Python extension.

Operators Block

The operators block contains a dictionary of all the operators contained in this Python extension. Each of these is referenced by its key. These keys must be unique to be used as a RapidMiner operator key, which is the reason for having a dictionary here.

Each operator contains the following:

  • name: The name of the operator as it should appear to the user
  • implementation: The name of the Python function that contains the code backing this operator. Note that the required inputs of this function are split into the parameters and inputs elements, depending on their type.
  • icon: Optional. The name of one of the icons provided by AI Studio. If not set, a default icon will be used.
  • tags: Optional. Contains an array of tags for the operator. Tags are used when searching for operators. By default, no tags are present.
  • synopsis: Optional. A short 1-2 sentence synopsis what this operator is doing.
  • description: Optional. A longer description text describing in more details what this operator is doing.
  • parameters: An array containing the parameters of the operator, which are used for getting settings into the script function as arguments. Each element within must consist of the following:

    • name: The name of the parameter, will be used as the key internally.
    • description: The description of the parameter, to inform the user what this parameter is for.
    • type: The type of the parameter. Depending on the type, some of the following elements can become optional. Currently supported types are:

      • integer: An int value will be provided as Python function input.
      • real: A float value will be provided as Python function input.
      • boolean: A bool value will be provided as Python function input.
      • string: A str value will be provided as Python function input.
      • category: An Enum constant will be provided as Python function input. Must also provide the categories with Enum values available for selection.
    • categories: Optional. Only used if the Python function requires an Enum. Lists the selectable values in an array.
    • default: Optional. A pre-selected default value for the parameter. Must conform to the type of the parameter.
    • optional: if true, the parameter can be ignored and left empty. If false, providing a proper value to the parameter will be enforced.
    • annotations: Optional. Array of parameter annotations. Each annotation itself is an array with variable number of elements. The first element is the annotation type, the remaining are the parameters. List of supported annotation types:

      • SelectedColumnAnnotation: Represents a selected column of an input dataframe. Parameters:

        1. Name of the input port it refers to (must be a dataframe).
      • TextParameterAnnotation: Represents longer texts. Parameters:

        1. Text type (for syntax highlighting). Supported values: PLAIN (default), JSON, XML, HTML, SQL and PYTHON.
      • ConditionalAnnotation: Represents operator parameter conditions (supported types: boolean, string and category). Parameters:

        1. Operator string. Must be == or != for (un-)equal string conditions.
        2. Name of the operator parameter the condition references. The type of the referenced parameter determines the type of condition.
        3. Value which the referenced parameter will be compared to. Must be either true or false for boolean conditions. Can be null, to check if a parameter is set.
  • inputs: An array containing the input ports of the operator, which are used for getting data into the script function as arguments. Each element within must consist of the following:

    • name: The name of the Python script function argument this input is for
    • type: The input type, currently supported values are:
      • table for data tables which get converted to a Pandas DataFrame
      • file for binary files which get converted to a Path object
    • optional: if true, the presence of actual input data at this port will not be enforced. If false, the absence of data will be flagged as an error.
  • outputs: An array containing the output ports of the operator, which are used for getting data from the script result back to the process. Each element within must consist of the following:

    • name: The name of the Python script function return value of this output. Used for identification if multiple output values are present.
    • type: The output type, currently supported values are:

      • table for Pandas DataFrame which will get converted to a RapidMiner data table
      • file for Path objects which will get converted to binary files.