You are viewing the RapidMiner Developers documentation for version 10.2 - Check here for latest version
API changes in RapidMiner 9.8 and 9.9
From ExampleSet to Belt Table
Forget about the ExampleSet
class and start using com.rapidminer.belt.table.Table
, RapidMiner's new representation of example sets. The corresponding framework is called Belt.
It comes with several advantages compared to ExampleSet
:
- Column-oriented design: a column-oriented data layout allows for using compact representations for the different column types.
- Immutability: all columns and tables are immutable. This not only guarantees data integrity but also allows for safely reusing components, e.g., multiple tables can safely reference the same column.
- Thread-safety: all public data structures are thread-safe and designed to perform well when used concurrently.
- Implicit parallelism: Many of Belt's built-in functionality, such as the transformations shown in the examples below, automatically scale out to multiple cores.
To learn everything about the Belt framework please refer to the official documentation of the Belt project.
This page will focus on the differences between the old example set and the new Belt framework and present some examples on how to implement operators using the Belt framework and the Table
class.
If you are new to extension development for RapidMiner Studio, then Create your own extension is a great starting point for you.
Sum operator example
Let's start with an example. We will create an operator that takes a table with only numeric columns, calculates the sum for each row and adds these row sums as a new column to the resulting table.
Data transformation
First of all the doWork()
method. You receive the input table by calling:
IOTable ioTable = tableInput.getData(IOTable.class);
Table table = ioTable.getTable();
You need not worry if the actual data at the port is an IOTable or an ExampleSet since RapidMiner will automatically convert it to the requested format.
This makes the collaboration between new operators working on Table
s and old operators working on ExampleSet
s easy.
Then to make the code a little bit cleaner we will outsource the actual work to the calculateSum
method.
// read table, calculate sum and return new table
Table result = calculateSum(table);
Now deliver the resulting table to the output port.
IOTable newIOTable = new IOTable(result);
newIOTable.getAnnotations().addAll(ioTable.getAnnotations());
tableOutput.deliver(newIOTable);
Since the Table
class itself is not an IOObject
we need to wrap it with the IOTable
class. Also it is important to copy the annotations of the input IOTable
to the new IOTable
because otherwise they will be lost.
Finally, it is good practice to also deliver the input table to an output port:
originalOutput.deliver(ioTable);
That's the doWork()
method.
Let's move on to implement the calculateSum(Table table)
method.
First of all check that the given Table contains only numeric columns.
The BeltErrorTools
class holds some convenience methods for this kind of checks.
BeltErrorTools.onlyNumeric(table, getName(), this);
Next, we will determine whether the result will be of type real or integer.
If any column is of type real, the result will also be of type real.
The table provides a ColumnSelector
that can be accessed via the select()
method.
A column selector can be used to filter the columns of a table via predicates.
The default predicates filter regarding type, category, capability and meta data (e.g. roles).
You can even define your own predicates for custom filter operations.
The ofTypeId
method does the trick:
boolean resultIsReal = !table.select().ofTypeId(Column.TypeId.REAL).labels().isEmpty();
Since the Column class is immutable, we need a column buffer to fill and instantiate a new column:
NumericBuffer buffer = resultIsReal ? Buffers.realBuffer(table.height())
: Buffers.integer53BitBuffer(table.height());
Tables can be read column-wise or row-wise. In this case we want to read it row-wise so that we can calculate the sum for each row:
NumericRowReader reader = Readers.numericRowReader(table);
for (int i = 0; i < buffer.size(); i++) {
// move must be called to advance the reader to the next row
reader.move();
double sum = 0;
for (int j = 0; j < reader.width(); j++) {
// reader.get(j) returns the value of the j-th column of the row
sum += reader.get(j);
}
buffer.set(i, sum);
}
The move method advances the reader to the next row. Please note that it must be called before the first row is read.
We have calculated the row sums and filled them into the buffer. Next, copy the original table and add a new column to it.
Since the Table
class is immutable we will use a table builder:
TableBuilder builder = Builders.newTableBuilder(table);
builder.add("Sum", buffer.toColumn());
Please note that the data stored in the buffer cannot be modified anymore after calling the toColumn
method. Attempting to do so will lead to an Exception.
Nearly done! All that's left to do is to build and to return the table.
And this is where Belt's implicit parallelism comes into play.
The build
method takes the operator's context that can be accessed via the BeltTools
class and runs the build process in parallel.
Table result = builder.build(BeltTools.getContext(this));
return result;
This concludes the data transformation for the operator.
Meta data transformation
Next, let's implement the meta data transformation.
The meta data class for IOTables
(called TableMetaData
) comes with methods and functionality similar to what the Table class offers:
public SumOperator(OperatorDescription description) {
super(description);
// we want TableMetaData with only numeric columns as input
tableInput.addPrecondition(new TablePrecondition(tableInput, Column.Category.NUMERIC));
// pass through the original data
getTransformer().addRule(new TablePassThroughRule(tableInput, originalOutput, SetRelation.EQUAL));
// generate meta data for new table
getTransformer().addRule(new TablePassThroughRule(tableInput, tableOutput, SetRelation.EQUAL) {
@Override
public TableMetaData modifyTableMetaData(TableMetaData metaData) {
return SumOperator.this.calculateSumMD(metaData);
}
});
}
The first few lines should be familiar to you if you have implemented meta data transformation before.
Use a precondition to show warnings to the user if the provided meta data is not TableMetaData
or if it holds any non-numeric columns.
Columns of category numeric are either integer or real columns.
The first TablePassThroughRule
passes through the table meta data to the original output without any modifications.
Add a second rule and override the modifyTableMetaData
method.
The meta data transformation can be done similar to the data transformation:
/**
* Analogue to {@link #calculateSum(Table)} but for {@link TableMetaData}.
*
* @param metaData
* the original TableMetaData
* @return new TableMetaData with the original columns and a sum column
*/
private TableMetaData calculateSumMD(TableMetaData metaData) {
// If any column is of type real the result will be real. Otherwise, it will be integer.
boolean resultIsReal = metaData.containsType(ColumnType.REAL, true) != MetaDataInfo.NO;
// copy original TableMetaData using TableMetaDataBuilder
TableMetaDataBuilder builder = new TableMetaDataBuilder(metaData);
// add the new column to the builder
if (resultIsReal) {
builder.addReal("Sum", null, SetRelation.UNKNOWN, MDInteger.newPossible());
} else {
builder.addInteger("Sum", null, SetRelation.UNKNOWN, MDInteger.newPossible());
}
// build the new TableMetaData
return builder.build();
}
Firstly, check if the table meta data contains any columns of type real.
If any of the original columns is of type real, the resulting new column will also be of type real.
Otherwise it will be of type integer.
Since the table meta data is immutable, use a TableMetaDataBuilder
to copy and to modify it.
You can use one of the convenience methods addReal
or addInteger
to add a new column to the meta data.
The first argument of these methods takes the new column's name.
Secondly, it expects the numeric range and a set relation describing uncertainty regarding the given range.
Since we do not know a lot about the actual data it is hard to predict the range of the resulting column.
The null
and SetRelation.UNKNOWN
arguments inform the builder that we do not know the resulting range.
Lastly, the MDInteger.newPossible()
argument sets the number of missing values to >= 0.
Build the new table meta data via the builder's build method
. This concludes the example.
import com.rapidminer.adaption.belt.IOTable;
import com.rapidminer.belt.buffer.Buffers;
import com.rapidminer.belt.buffer.NumericBuffer;
import com.rapidminer.belt.column.Column;
import com.rapidminer.belt.column.ColumnType;
import com.rapidminer.belt.reader.NumericRowReader;
import com.rapidminer.belt.reader.Readers;
import com.rapidminer.belt.table.Builders;
import com.rapidminer.belt.table.Table;
import com.rapidminer.belt.table.TableBuilder;
import com.rapidminer.operator.Operator;
import com.rapidminer.operator.OperatorDescription;
import com.rapidminer.operator.OperatorException;
import com.rapidminer.operator.UserError;
import com.rapidminer.operator.ports.InputPort;
import com.rapidminer.operator.ports.OutputPort;
import com.rapidminer.operator.ports.metadata.MDInteger;
import com.rapidminer.operator.ports.metadata.MetaDataInfo;
import com.rapidminer.operator.ports.metadata.SetRelation;
import com.rapidminer.operator.ports.metadata.table.TableMetaData;
import com.rapidminer.operator.ports.metadata.table.TableMetaDataBuilder;
import com.rapidminer.operator.ports.metadata.table.TablePassThroughRule;
import com.rapidminer.operator.ports.metadata.table.TablePrecondition;
import com.rapidminer.tools.belt.BeltErrorTools;
import com.rapidminer.tools.belt.BeltTools;
/**
* This operator takes a {@link Table} with only numeric columns, calculates the sum for each row
* and adds it as a new column.
*/
public class SumOperator extends Operator {
private final InputPort tableInput = getInputPorts().createPort("example set input");
private final OutputPort tableOutput = getOutputPorts().createPort("example set output");
private final OutputPort originalOutput = getOutputPorts().createPort("original");
public SumOperator(OperatorDescription description) {
super(description);
// we want TableMetaData with only numeric columns as input
tableInput.addPrecondition(new TablePrecondition(tableInput, Column.Category.NUMERIC));
// pass through the original data
getTransformer().addRule(new TablePassThroughRule(tableInput, originalOutput,
SetRelation.EQUAL));
// generate meta data for new table
getTransformer().addRule(new TablePassThroughRule(tableInput, tableOutput,
SetRelation.EQUAL) {
@Override
public TableMetaData modifyTableMetaData(TableMetaData metaData) {
return SumOperator.this.calculateSumMD(metaData);
}
});
}
@Override
public void doWork() throws OperatorException {
// fetch table from input port
IOTable ioTable = tableInput.getData(IOTable.class);
Table table = ioTable.getTable();
// read table, calculate sum and return new table
Table result = calculateSum(table);
// wrap the result into an IOTable
IOTable newIOTable = new IOTable(result);
// copy the annotations from the original IOTable
newIOTable.getAnnotations().addAll(ioTable.getAnnotations());
// deliver the new IOTable to the port
tableOutput.deliver(newIOTable);
// deliver original table to corresponding port
originalOutput.deliver(ioTable);
}
/**
* Takes a {@link Table} with only numeric columns, calculates the sum for each row and adds it
* as a new column.
*
* @param table
* the original table
* @return a new table with the original columns and a sum column
* @throws UserError
* if the table contains non-numeric columns
*/
private Table calculateSum(Table table) throws UserError {
// check that all columns are numeric
BeltErrorTools.onlyNumeric(table, getName(), this);
// If any column is of type real the result will be real. Otherwise, it will be integer.
boolean resultIsReal = !table.select().ofTypeId(Column.TypeId.REAL).labels().isEmpty();
// initialize numeric buffer needed to create sum column
NumericBuffer buffer = resultIsReal ? Buffers.realBuffer(table.height())
: Buffers.integer53BitBuffer(table.height());
// read the table row-wise and store the sum of each row in the buffer
NumericRowReader reader = Readers.numericRowReader(table);
for (int i = 0; i < buffer.size(); i++) {
// move must be called to advance the reader to the next row
reader.move();
double sum = 0;
for (int j = 0; j < reader.width(); j++) {
// reader.get(j) returns the value of the j-th column of the row
sum += reader.get(j);
}
buffer.set(i, sum);
}
// copy original table using table builder
TableBuilder builder = Builders.newTableBuilder(table);
// add the new column to the builder
builder.add("Sum", buffer.toColumn());
// build the new table in parallel using the operator's context
Table result = builder.build(BeltTools.getContext(this));
return result;
}
/**
* Analogue to {@link #calculateSum(Table)} but for {@link TableMetaData}.
*
* @param metaData
* the original TableMetaData
* @return new TableMetaData with the original columns and a sum column
*/
private TableMetaData calculateSumMD(TableMetaData metaData) {
// If any column is of type real the result will be real. Otherwise, it will be integer.
boolean resultIsReal = metaData.containsType(ColumnType.REAL, true) != MetaDataInfo.NO;
// copy original TableMetaData using TableMetaDataBuilder
TableMetaDataBuilder builder = new TableMetaDataBuilder(metaData);
// add the new column to the builder
if (resultIsReal) {
builder.addReal("Sum", null, SetRelation.UNKNOWN, MDInteger.newPossible());
} else {
builder.addInteger("Sum", null, SetRelation.UNKNOWN, MDInteger.newPossible());
}
// build the new TableMetaData
return builder.build();
}
}
In this example you have seen how to fetch and deliver a table from and to ports. How to read a table and process its data, create a new column using a buffer and return a modified table using the TableBuilder class.
There are alternative ways to implement the operator, of course. Look, for example, at the following code:
private Table calculateSum(Table table) throws UserError {
// check that all columns are numeric
BeltErrorTools.onlyNumeric(table, getName(), this);
// If any column is of type real the result will be real. Otherwise, it will be integer.
boolean resultIsReal = !table.select().ofTypeId(Column.TypeId.REAL).labels().isEmpty();
// this function will be applied in parallel to the table rows
ToDoubleFunction<NumericRow> sumUpRow = row -> {
double sum = 0;
for (int j = 0; j < row.width(); j++) {
sum += row.get(j);
}
return sum;
};
// the results will be collected in a numeric buffer
NumericBuffer buffer;
if(resultIsReal){
buffer = table.transform().applyNumericToReal(sumUpRow, BeltTools.getContext(this));
} else {
buffer = table.transform().applyNumericToInteger53Bit(sumUpRow, BeltTools.getContext(this));
}
// copy original table using table builder
TableBuilder builder = Builders.newTableBuilder(table);
// add the new column to the builder
builder.add("Sum", buffer.toColumn());
// build the new table in parallel using the operator's context
Table result = builder.build(BeltTools.getContext(this));
return result;
}
This code uses the Table
's transform method and a row transformer to achieve the same results as the calculateSum
method presented earlier.
Details on the transform
method can be found here.
Using the transform method comes with the additional advantage that the summations potentially take place in parallel.
Belt once again makes use of the operator's context to automatically decide if and how to parallelize the computation.
The next example shows how to use generators to fill columns and how to add column meta data like, for example, roles to a table.
ID generator example
Next, let's implement an operator that takes a table and adds an ID column to it. Here is the code of its doWork()
method:
@Override
public void doWork() throws OperatorException {
// fetch table from input port and initialize builder
IOTable ioTable = tableInput.getData(IOTable.class);
Table table = ioTable.getTable();
TableBuilder builder = Builders.newTableBuilder(table);
// add id column via generator
builder.addInt53Bit("ID", i -> i);
// set column role
builder.addMetaData("ID", ColumnRole.ID);
// add annotations and deliver results
Table result = builder.build(BeltTools.getContext(this));
IOTable newIOTable = new IOTable(result);
newIOTable.getAnnotations().addAll(ioTable.getAnnotations());
tableOutput.deliver(newIOTable);
// deliver original table to corresponding port
originalOutput.deliver(ioTable);
}
We fetch the input table and initialize the builder with it just as we did before. Then add the id column via:
builder.addInt53Bit("ID", i -> i);
This line of code makes use of one of the table builder's convenience methods that takes a label and a generator and automatically fills the column.
Furthermore, it does not fill the column straight away but does so later when the build
method is called.
Thereby, the builder can fill all columns in parallel.
Let's take a closer look at the generator.
For numeric column types it is represented via an IntToDoubleFunction
.
The generator consumes a row index and returns the value for that row.
Our implementation returns the row index itself as the result and, thereby, generates ids from 0 to the number of rows - 1.
Similar generator methods for other column types are also available in the table builder.
The next step is to set the column's role to ColumnRole.ID
.
The builder's addMetaData
method takes a column label and meta data to attach to the corresponding column.
Since ColumnRole
implements ColumnMetaData
it can be attached via this method.
Finally, the resulting table is wrapped into an IOTable, the annotations are copied, and the table is delivered to the output port.
Meta data transformation
Start by adding a constructor similar to what we have done in the last example:
public IDOperator(OperatorDescription description) {
super(description);
// we want TableMetaData as input
tableInput.addPrecondition(new TablePrecondition(tableInput));
// pass through the original data
getTransformer().addRule(new TablePassThroughRule(tableInput, originalOutput, SetRelation.EQUAL));
// generate meta data for new table
getTransformer().addRule(new TablePassThroughRule(tableInput, tableOutput, SetRelation.EQUAL) {
@Override
public TableMetaData modifyTableMetaData(TableMetaData metaData) {
return IDOperator.this.transformMetaData(metaData);
}
});
}
Once again, the actual meta data transformation is outsourced to a private method for better readability:
private TableMetaData transformMetaData(TableMetaData metaData) {
// determine range
MDInteger numRows = metaData.height();
Range range = null;
if (numRows.getNumber() > 0) {
range = new Range(0, numRows.getNumber() - 1);
}
// determine set relation for the range
SetRelation relation;
switch (numRows.getRelation()) {
case AT_LEAST:
relation = SetRelation.SUPERSET;
break;
case EQUAL:
relation = SetRelation.EQUAL;
break;
case AT_MOST:
relation = SetRelation.SUBSET;
break;
case UNKNOWN:
default:
relation = SetRelation.UNKNOWN;
}
// build id column
ColumnInfoBuilder columnBuilder = new ColumnInfoBuilder(ColumnType.INTEGER_53_BIT);
columnBuilder.setNumericRange(range, relation);
columnBuilder.setMissings(0);
ColumnInfo idColumn = columnBuilder.build();
// add id column to table
TableMetaDataBuilder builder = new TableMetaDataBuilder(metaData);
builder.add("ID", idColumn);
// set column role id
builder.addColumnMetaData("ID", ColumnRole.ID);
return builder.build();
}
Since the operator generates id values between 0 and table height - 1 we can infer the range of the resulting id column. If we are uncertain about the table height, this translates into uncertainty about the range. Therefore, the appropriate set relation for the range is determined with the switch statement.
The TableMetaData
's columns are represented via the immutable ColumnInfo
class.
Build a new column info of type integer with the calculated range and relation using a ColumnInfoBuilder
.
Also set the number of missing values to exactly 0 since the operator will never generate missing values.
Finally, add the new column to the table meta data and set its role to ColumnRole.ID
using the table meta data builder's addColumnMetaData
method.
import com.rapidminer.adaption.belt.IOTable;
import com.rapidminer.belt.column.ColumnType;
import com.rapidminer.belt.table.Builders;
import com.rapidminer.belt.table.Table;
import com.rapidminer.belt.table.TableBuilder;
import com.rapidminer.belt.util.ColumnRole;
import com.rapidminer.operator.Operator;
import com.rapidminer.operator.OperatorDescription;
import com.rapidminer.operator.OperatorException;
import com.rapidminer.operator.ports.InputPort;
import com.rapidminer.operator.ports.OutputPort;
import com.rapidminer.operator.ports.metadata.MDInteger;
import com.rapidminer.operator.ports.metadata.SetRelation;
import com.rapidminer.operator.ports.metadata.table.ColumnInfo;
import com.rapidminer.operator.ports.metadata.table.ColumnInfoBuilder;
import com.rapidminer.operator.ports.metadata.table.TableMetaData;
import com.rapidminer.operator.ports.metadata.table.TableMetaDataBuilder;
import com.rapidminer.operator.ports.metadata.table.TablePassThroughRule;
import com.rapidminer.operator.ports.metadata.table.TablePrecondition;
import com.rapidminer.tools.belt.BeltTools;
import com.rapidminer.tools.math.container.Range;
/**
* This operator takes a {@link Table} and adds an ID column to it.
*/
public class IDOperator extends Operator {
private final InputPort tableInput = getInputPorts().createPort("example set input");
private final OutputPort tableOutput = getOutputPorts().createPort("example set output");
private final OutputPort originalOutput = getOutputPorts().createPort("original");
public IDOperator(OperatorDescription description) {
super(description);
// we want TableMetaData as input
tableInput.addPrecondition(new TablePrecondition(tableInput));
// pass through the original data
getTransformer().addRule(new TablePassThroughRule(tableInput, originalOutput,
SetRelation.EQUAL));
// generate meta data for new table
getTransformer().addRule(new TablePassThroughRule(tableInput, tableOutput,
SetRelation.EQUAL) {
@Override
public TableMetaData modifyTableMetaData(TableMetaData metaData) {
return IDOperator.this.transformMetaData(metaData);
}
});
}
@Override
public void doWork() throws OperatorException {
// fetch table from input port and initialize builder
IOTable ioTable = tableInput.getData(IOTable.class);
Table table = ioTable.getTable();
TableBuilder builder = Builders.newTableBuilder(table);
// add id column via generator
builder.addInt53Bit("ID", i -> i);
// set column role
builder.addMetaData("ID", ColumnRole.ID);
// add annotations and deliver results
Table result = builder.build(BeltTools.getContext(this));
IOTable newIOTable = new IOTable(result);
newIOTable.getAnnotations().addAll(ioTable.getAnnotations());
tableOutput.deliver(newIOTable);
// deliver original table to corresponding port
originalOutput.deliver(ioTable);
}
private TableMetaData transformMetaData(TableMetaData metaData) {
// determine range
MDInteger numRows = metaData.height();
Range range = null;
if (numRows.getNumber() > 0) {
range = new Range(0, numRows.getNumber() - 1);
}
// determine set relation for the range
SetRelation relation;
switch (numRows.getRelation()) {
case AT_LEAST:
relation = SetRelation.SUPERSET;
break;
case EQUAL:
relation = SetRelation.EQUAL;
break;
case AT_MOST:
relation = SetRelation.SUBSET;
break;
case UNKNOWN:
default:
relation = SetRelation.UNKNOWN;
}
// build id column
ColumnInfoBuilder columnBuilder = new ColumnInfoBuilder(ColumnType.INTEGER_53_BIT);
columnBuilder.setNumericRange(range, relation);
columnBuilder.setMissings(0);
ColumnInfo idColumn = columnBuilder.build();
// add id column to table
TableMetaDataBuilder builder = new TableMetaDataBuilder(metaData);
builder.add("ID", idColumn);
// set column role id
builder.addColumnMetaData("ID", ColumnRole.ID);
return builder.build();
}
}
ColumnMetaData
ColumnMetaData
represents additional information that can be attached to columns. Classes implementing ColumnMetaData
by default are:
- ColumnRole: Representing the roles used in Studio to mark special columns like, for example, labels.
- ColumnAnnotation: A textual description of the column.
- ColumnReference: A reference to another column that is somehow related to the column. An example would be a prediction column referencing the label column that it refers to.
Custom meta data can be added to the columns by implementing the ColumnMetaData
interface.
Please note that column annotations and references are not visualized in RapidMiner Studio yet, but we plan on doing so in the near future.
Two important changes have been made to column roles. Firstly, roles need not be unique anymore. A table can have multiple label, prediction and even id columns. This comes in handy, e.g., when working with learners that expect multiple labels. Secondly, in Belt the set of column roles is fixed to BATCH, CLUSTER, ID, LABEL, OUTLIER, PREDICTION, SCORE, WEIGHT, INTERPRETATION, ENCODING, SOURCE and METADATA. While the first eleven of them are the default roles, METADATA stands for anything other than the known roles. Columns marked as METADATA will usually be ignored by operators (e.g. when creating models). Legacy roles that do not exist in Belt will be mapped to METADATA.
Automatic conversion between Table and ExampleSet / TableMetaData and ExampleSetMetaData
Table
will be converted to ExampleSet
and vice versa depending on the format the operator requests a port to deliver it in.
(The same holds true for TableMetaData
and ExampleSetMetaData
.)
This conversion is done very efficient so that in most cases this will not impact the overall performance of a process.
Please note:
- Since
ExampleSet
expects roles to be unique, non-unique roles will have an index appended to their name when converting fromTable
toExampleSet
. When such a role is converted back at a later point in the process, the unnecessary index will automatically be removed. - Attribute / column types will be mapped to the next best representation in the converted format. Some of the Belt column types do not have a representation in the old API. Therefore, attempting to deliver an
IOTable
holding column types not included inBeltConverter.STANDARD_TYPES
will lead to an exception. This restriction may be removed in one of the future releases.
MetaData class for IOTables
Since RapidMiner version 9.9 there is an IOTable
specific meta data class called TableMetaData
that should be used for the meta data transformation.
The TableMetaData
class is conceptually very similar to the Table
class and, therefore, easy to use once you have understood the Table
class.
For RapidMiner version 9.8 ExampleSetMetaData
is the legacy MetaData
class used to describe IOTable
s at the operator ports.