Create an extension with basic operators

In this tutorial, you will build a Python extension with basic set of operators. By the end of the tutorial, you will be able to do the following.

  • Create different operators with different parameter types
  • Create different input and output ports for the operators
  • Log messages in the Log console of Altair AI Studio

To start with the tutorial, you will need to set up the conda environment and install Altair AI Studio 2025.1 with the Altair Units license.

Set up a new conda environment

Let's start by creating a new conda environment that you will use in the development machine.

conda create -n sample-environment python=3.12
conda activate sample-environment
conda install altair-aitools-devkit -c conda-forge

Create a new project

Next, create a blank project for the extension.

mkdir python-extension-sample
cd python-extension-sample
create-extension
  • The create-extension command will ask to create a blank or a sample extension. Enter blank
  • Enter the name of the extension: python-extension-sample
  • Enter the version number: let's keep it as default, hit enter.
  • Enter the author name: You can enter your name or choose blank.

Once created, open the project in VS Code (or your favorite IDE).

Configure extension.toml

By default, the extension.toml file will be available with all the details. Open the file and you should see the following info

[extension]
name = "python-extension-sample"
version = "0.0.1"
authors = [""]
namespace = "pysa"
module =  "python-extension-sample"
environment = "sample-environment"

Next, let's start adding some basic operators.

Create operators

  • First, create a new directory called basic_operators under the python-extension-sample folder.

  • Create a new file parameter_types.py under the basic_operators folder.

Next, we will add some Python functions.

Boolean parameter

To add a boolean parameter, you can define a function with the bool datatype in the argument.

  • Add a new function called check_bool and create a new parameter input1 with the bool datatype.
  • Add DataFrame as the return type.

from pandas import DataFrame

def check_boolean(input1: bool) -> DataFrame:
    """This Operator checks if the the given value is an boolean

    Args:
        input1 (bool): true/false boolean value from the user

    Returns:
        DataFrame: input to studio O/P - dataframe with true if boolean is presented as the input
    """
    return DataFrame([isinstance(input1, bool)])

Build the extension

Before adding new operators, lets check if we can build the extension and see the operator in Altair AI Studio.

  1. Add a new __init__.py file under the python-extension-samples folder.

  2. Import the functions from the basic_operators.py file.

     from .basic_operators import *
    
  3. Add the operator definition in the extension.toml file with group_key as basic.parameter_types.

     [operators.check_boolean]
     group_key = "basic.parameter_types"
    
  4. Open the terminal and build the extension using the build-extension command.

     build-extension -o "~\.AltairRapidMiner\AI Studio\shared\extensions\python-extensions" .
    
  5. Once built, open Altair AI Studio and from the menu, go to Extensions -> About Installed Python Extensions.

  6. Wait for the Python Samples extension to be Ready.

  7. Now, add the operator to the canvas, run the process, and see the output.

    alt_text

Let's continue adding more operators.

Numeric parameters

We will add functions that defines an integer and a float parameter.

  • Add a new function called check_integer and an argument input1 with int datatype.
  • Add DataFrame as the return type.
def check_integer(input1: int) -> DataFrame:
    """This Operator checks if the the given value is an integer

    Args:
        input1 (int): any integer value from the user

    Returns:
        DataFrame: input to studio O/P - dataframe with true if integer is presented as the input
    """
    return DataFrame([isinstance(input1, int)])
  • Similarly, add a new function to check_float to check if the parameter is float.
  • Add DataFrame as the return type.
def check_float(input1: float) -> DataFrame:
    """This Operator checks if the the given value is an float

    Args:
        input1 (float): any float value from the user

    Returns:
        DataFrame: input to studio O/P - dataframe with true if float is presented as the input
    """
    return DataFrame([isinstance(input1, float)])

Note: You can build extension after adding functions. Follow the steps mentioned in the previous section.

String parameter

  • Add a new function with the argument of string type.
def check_string(input1: str) -> DataFrame:
    """This Operator checks if the the given value is an string

    Args:
        input1 (str): any string value from the user

    Returns:
        DataFrame: input to studio O/P - dataframe with true if string is presented as the input
    """
    return DataFrame([isinstance(input1, str)])

Category parameter

You can define the parameter with fixed set of constants using Python Enums. Create a new class that extends Enum having three values and a funciton check_enum to use it as the parameter.

In the function, we make use of the sklearn preprocessing function and apply them to the input dataframe.

Note: use conda/pip to install the sklearn in the conda environment.

from enum import Enum
from sklearn.preprocessing import MaxAbsScaler, MinMaxScaler, StandardScaler

class Scaler(Enum):
    """Supported normalization methods."""
    MAX_ABSOLUTE = MaxAbsScaler
    MIN_MAX = MinMaxScaler
    STANDARD = StandardScaler

def check_enum(input1: Scaler) -> DataFrame:
    """This Operator checks if the the given value is an enum

    Args:
        input1 (Scaler): any enum value from the user

    Returns:
        DataFrame: input to studio O/P - dataframe with true if enum is presented as the input
    """
    return DataFrame([isinstance(input1, Scaler)])

In the following section, we will add operators with different inputs and output ports.

  • Create a new file port_combination.py under basic_operators folder.

Multiple input ports

In this function, let's add three input dataframes and return the combined dataframe.

  • Add a function multiple_input.
import pandas as pd
from pandas import DataFrame

def multiple_input(input1: DataFrame, input2: DataFrame, input3: DataFrame) -> DataFrame:
    """This operator accepts multiple inputs and combines them into one dataframe

    Returns:
        DataFrame: combined dataframe
    """
    return pd.concat([input1, input2, input3], axis=1)

When built, the operator will be able to accept three dataframes as input and will return combined dataframe as the result.

multiple_inputs

Next, lets add an operator with different outputs.

Multiple output ports

To add multiple outputs for an operator, you can use tuple to combine them.

  • Add a new function multiple_output
  • For return, combine different dataframes using tuple.
def multiple_output(data: DataFrame) -> tuple[DataFrame, DataFrame]:
    """This Operator splits the given input into multiple outputs by splitting its rows randomly

    Args:
        data (DataFrame): Dataframe to be split

    Returns:
        tuple[DataFrame,DataFrame]: splits the given data rows randomly
    """
    len = data.shape[0]
    output1 = data.iloc[:, :len]
    output2 = data.iloc[:, len:]
    return (output1, output2)

Once built, the operator will look like shown in the image below.

multiple-output-op-process

Apart from having multiple outputs, the operator can also provide list of outputs as a collection. In the next step, we will create a function that creates a collection of outputs.

Output Collection

  • Add a new function collection_output that has two arguments - a dataframe and an integer.
  • The return value for the function will be a list of dataframe.
def collection_output(data: DataFrame, n_parts: int = 2) -> list[DataFrame]:
    """This Operator splits the given input into multiple outputs by splitting its rows

    Args:
        data (DataFrame): Dataframe to be split
        n_parts (int): number of parts to split the data into

    Returns:
        list[DataFrame]: splits the given data rows
    """
    len = data.shape[0]
    parts = []
    for i in range(n_parts):
        parts.append(data.iloc[i * len // n_parts : (i + 1) * len // n_parts, :])
    return parts

Again, add the operator configuration to the extension.toml file and build the extension.

  • Create a new process and add collection output operator to the canvas.

    collection-output

  • When run, the output will be the list of collection. In this case, we will see two example set in the Result view of the Altair AI Studio.

    collection-result

Adding Logging

Logging in Python Extensions is managed by Python's standard logging interface. The default configuration is sufficient for optimal operation. Additional configuration changes are neither necessary nor advised, as they may lead to unintended consequences.

For this tutorial, we will create a long running function, that sleeps for 1 second and logs the message.

  • Create a new file helpers.py and add new functon long_runner.
  • Add the following code
from pandas import DataFrame
import time
from datetime import datetime
import logging

def long_runner(input1: int) -> DataFrame:
    """ This Operator runs for a long time

    Args:
        input1 (int): any integer value from the user which represents the time in minutes

    Returns:
        DataFrame: input to studio O/P - dataframe with time column which logs the time every minute
    """
    times = []
    for _ in range(input1):
        current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        times.append(current_time)
        logging.info(f"Logged time: {current_time}")
        for _ in range(60):
            time.sleep(1)
    return DataFrame(times, columns=["Logged Time"])
  • Add the operator configuration in the extension.toml file and build the extension again.
  • Create a new process, add the operator to it and run the process.
  • Check the logs from the Log view of the Altair AI Studio.

logging-view

Final thoughts

This is the end of the tutorial for creating basic set of operators and parameters with the altair-aitools-devkit package.

In this tutorial, we learned how to add different parameters like boolean, integer, float, string as well as saw how to define different input and output combinations.

Want to move on to create more advanced set of operators? Go to the next tutorial on creating advanced operators.

If you have any questions or feedback, feel free to post query on our community page.