Tutorial - Creating advanced operators
In this tutorial, you will build a python extension with some advanced operators using the altair-aitools-devkit
python package. By the end of the tutorial, you will be able to do the following.
- Define different annotations to parameters like Conditional, Selected Columns, Roles etc.
- Use various IO objects like files.
- Access resources from the extension.
Note: For this tutorial, we will continue adding functions to the same project created in the basic operator tutorial.
Parameter annotations
First, let's start by creating a new folder called advanced_operators. We will add new functions and files under this folder.
- Create a new Python file annotations.py
Conditional parameters
In Python extensions, all operator parameters are defined as function inputs (optionally with default values). However, parameters may be conditional on the value of others meaning that they should be hidden if a specified condition is not met. For instance, different dropdown items may require different parameters to be set.
To this end, use typing.Annotated
as a wrapper around the parameter and then specify the conditions with ConditionalAnnotation
from altair_aitools.ext.annotations
. If a parameter has multiple conditional annotations, the final condition is achieved by joining all conditions with AND operators.
some_integer_parameter: Annotated[int, CONDITION_1, CONDITION_2, ..., CONDITION_N]
So let's create a Python function called greeting
add conditional parameter to it.
- Import
ConditionalAnnotation
from thealtair_aitools
. - Add the following function.
from pandas import DataFrame
from altair_aitools.ext.annotations import ConditionalAnnotation
def greetings(
name: str = "",
surname: Annotated[str, ConditionalAnnotation.ne("name", "")] = "",
) -> DataFrame:
"""this operator is used to greet the user
Args:
name (str): The name of the user as a string. Defaults to empty string.
surname (Annotated[str, ca.NE, optional): The name of the user as a string. Defaults to empty string.
Returns:
DataFrame: dataframe with greetings message of the user provided values
"""
message = f"Hello, {name} {surname}!"
return DataFrame({"Greeting": [message]})
Add the operator configuration to the extension configuration and build the extension.
Create a new process and add the operator to it.
If you enter a value to the name, the surname parameter will appear automatically.
Supported logical operators for condition definition
Conditional annotations can be defined by calling logical operators. The annotation class altair_aitools.ext.annotations.ConditionalAnnotation
supports the supported logical operators:
Name | Description |
---|---|
eq | equals |
ne | not equals |
These operators should be called at parameter definition with two arguments:
- key (str): Name of the parameter to depend on.
- value (Any): Value to compare the parameter with.
from typing import Annotated
from altair_aitools.ext.annotations import ConditionalAnnotation
def do_something_with_names(
name: str = "",
surname: Annotated[str, ConditionalAnnotation.ne("name", "")] = "", # Will only be showed if name is set (name != "")
) -> None: ...
Now let's move to the next annotated parameter type. Sometime, it is required to use the columns from the input datatable to perform some operations. So let's add another Python function.
DataFrame Column Selector
String parameters can be annotated with SelectedColumnAnnotation
from altair_aitools.ext.annotations
indicating that the parameter represents a column name of an other (DataFrame) input. This annotation object can simply be constructed by calling it with a str
type argument, which is the name of the relevant DataFrame the user can select the column from operator in Altair AI Studio.
Important: the name of the DataFrame in the annotation, and the name of the DataFrame parameter in the function's signature must match!
from pandas import DataFrame
from typing import Annotated
from altair_aitools.ext.annotations import SelectedColumnAnnotation
def process_df_with_selected_column(
my_dataframe: DataFrame,
selected_column: Annotated[str, SelectedColumnAnnotation("my_dataframe")]
) -> None: ...
For this tutorial, let's add a new function that takes the datatable as the input and an annotated column parameter. The resultant operator will set the role to label for the selected column from the parameter.
from pandas import DataFrame
from typing import Annotated
from altair_aitools.ext.annotations import SelectedColumnAnnotation
def set_label(
df: DataFrame,
col: Annotated[str, SelectedColumnAnnotation("df")],
) -> DataFrame:
"""Sets the role of the specified column in the DataFrame to 'label'.
Args:
df (DataFrame): The DataFrame containing the column.
col (Annotated[str, SelectedColumnAnnotation]): The name of the column in the DataFrame.
Returns:
DataFrame: The DataFrame with the role of the specified column set to 'label'.
"""
set_role(df, col, ColumnRole.LABEL)
return df
- Add the operator configuration and the build the extension.
- Create a new process and add the operator to it.
- Connect the sample data to the input port and select the columns from it. The operator will set the column as the label.
Next, let's learn how to add Long text parameters to the operators.
Long Text Parameter
String parameters can be annotated with TextParameterAnnotation
from altair_aitools.ext.annotations
indicating that the parameter holds a longer text. This leads to a button opening a text editor on the Altair AI Studio. Also, the text type can be specified by passing a value of the enum TextType
to the annotation initializer. The text type is only used for syntax highlighting on the GUI. Supported types: PLAIN
(default), JSON
, XML
, HTML
, SQL
and PYTHON
.
As an example,
from typing import Annotated
from altair_aitools.ext.annotations import TextParameterAnnotation, TextType
def use_longer_texts(
plain_text: Annotated[str, TextParameterAnnotation()], # Same as providing TextType.PLAIN
json_text: Annotated[str, TextParameterAnnotation(TextType.JSON)],
) -> None: ...
let's put that into action and create a new function that parses a given JSON.
- Create a new function
parse_json
with the parameterjson_text
. - Add the following function code.
def parse_json(
json_text: Annotated[str, TextParameterAnnotation(TextType.JSON)],
) -> DataFrame:
"""Parses the provided JSON text into a DataFrame.
Args:
json_text (Annotated[str, TextParameterAnnotation]): The JSON text to parse.
Returns:
DataFrame: The DataFrame containing the parsed JSON data.
"""
return DataFrame.from_dict(json.loads(json_text))
- Add the operator to the extension configuration and build the extension.
- Create a new process and add the operator to it.
- Click on the parameter button and add the following JSON.
{
"name": "France",
"capital": "Paris",
"population": 67364357,
"area": 551695,
"currency": "Euro",
"languages": ["French"],
"region": "Europe",
"subregion": "Western Europe",
"flag": "https://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg"
}
- Run the process and the result should display the datatable.
Column roles
The concept of attribute/column roles is supported in Python extensions. It is handled with the built-in attrs
dictionary attribute of pandas.DataFrame
. So users can access and set these roles with the key "role"
in the aforementioned dictionary. For convenience, the DevKit provides the following helper functions in the altair_aitools.ext.metadata
package to manage column roles:
get_all_roles(df: pd.DataFrame, ref: bool = False) -> dict[str, ColumnRole]
: Get column roles for a DataFrame.ref
: IfTrue
, returns a reference to the roles dictionary. IfFalse
, returns a copy of the roles dictionary.
get_role(df: pd.DataFrame, col: str) -> ColumnRole
: Get the role of a column in a DataFrame.set_role(df: pd.DataFrame, col: str, role: ColumnRole) -> None
: Set a column role for a DataFrame.
The enum altair_aitools.ext.metadata.ColumnRole
contains the supported values for column roles. If a columns does not have a special role, ColumnRole.REGULAR
is used by default.
- Add the following function.
import pandas as pd
from typing import Annotated
from altair_aitools.ext.metadata import ColumnRole, set_role
from altair_aitools.ext.annotations import SelectedColumnAnnotation
def set_role_operator(
df: pd.DataFrame,
col: Annotated[str, SelectedColumnAnnotation("df")],
role: ColumnRole,
) -> pd.DataFrame:
"""Operator to set the role of the specified column in the DataFrame."""
set_role(df, col, role)
return df
- Add the operator to the extension configuration and build the extension.
- Create a new process and add the operator to it. Select the column and set the Role for the selected column.
Accessing resources
In order to access a resource file from the resources folder, one can use the Resource
class from altair_aitools.ext.io
. Its initializer takes the path of the file relative to the resources folder, and implements the context manager pattern providing the resource file as a byte stream. If the resource is needed to be available as a file, one can use the tempfile()
function on the Resource
instance, and the resource file will be extracted with automatically managed lifecycle.
import pandas as pd
from altair_aitools.ext.io import Resource
def sample_data() -> pd.DataFrame:
with Resource("data.csv") as resource: # resource: IO[bytes]
return pd.read_csv(resource)
def sample_data_as_file() -> pd.DataFrame:
with Resource("data.csv").tempfile() as file_path: # file_path: str
return pd.read_csv(file_path)