Custom Data Objects
The Altair AI Tools Python devkit framework offers a unique capability to work with custom data objects when standard dataframes or datatables are not enough. Custom data objects allow you to model and work with structured data in ways that best suit your extension logic—giving you full control over input and output types for your operators.
Use custom data objects when:
- Your data does not fit well into a typical tabular format.
- You need to encapsulate multiple data types or nested structures.
- You want to leverage type safety by defining clear contracts between operators.
To explain how custom data objects work, lets understand with an example.
Defining custom data objects
To create custom data object, subclass the BaseDataObject
from the altair_aitools.ext.io
module. For example, if you need to represent the shape of the tensor, you can define a custom class like this:
from altair_aitools.ext.io import BaseDataObject
class TensorShape2D(BaseDataObject):
height:int
weight: int
Once defined, these objects can be used as inputs or outputs for your operators. In the following example, a function returns a TensorShape2D
instance:
def create_tensor(height:int, weight:int) -> TensorShape2D:
return TensorShape2D(height=height, width=width)
Serialization
The altair-aitools-devkit
automatically serializes the custom data objects as JSON. This means that you can easily store, transfer, and recreate your objects between AI Studio. When serialized, the object includes the metadata that helps with the versioning and proper deserialization. For example, the TensorShape2D
gets serialzied to:
{
"pyext_namespace": "serialization",
"pyext_version": "0.1.0",
"pyext_object_type": "serialization.TensorShape2D",
"object": {
"height": 2,
"width": 3
}
}
Integrating with Third-party Data types
You can also integrate third-party objects by subclassing ObjectSerializer
with your desired type as a type argument.
When subclassing ObjectSerializer
, you need to implement two static methods:
object_to_dict
: Converts the obhect into a JSON-friendly dictionarydict_to_object
: Recreates the object from the dictionary representation.
For example, here's a serializer for a SciPy sparse CSR matrix:
from typing import Dict, Any
from scipy.sparse import csr_matrix
from altair_aitools.ext.io import ObjectSerializer
class SparseMatrixSerializer(ObjectSerializer[csr_matrix]):
"""Serializer for scipy sparse CSR matrix objects."""
@staticmethod
def object_to_dict(object: csr_matrix) -> Dict[str, Any]:
return {
"data": object.data, # Keep as numpy array
"indices": object.indices, # Keep as numpy array
"indptr": object.indptr, # Keep as numpy array
"shape": list(object.shape),
}
@staticmethod
def dict_to_object(object: Dict[str, Any]) -> csr_matrix:
return csr_matrix(
(object["data"], object["indices"], object["indptr"]), # Unpack numpy arrays
shape=tuple(object["shape"])
)
def np_to_csr(array: np.ndarray) -> csr_matrix:
"""Operator to convert a numpy array to a CSR sparse matrix."""
return csr_matrix(array)
This approach allows you to seamlessly incorporate non-standard data types into your Altair AI Tools workflows, leveraging the full power and flexibility of the Python SDK.