Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.2 - Check here for latest version

Sort (Blending)

Synopsis

This operator sorts the input data set in ascending or descending order according to several attributes.

Description

This operator sorts the data set provided at the input port. The complete data set is sorted according to a single or more attributes. The attributes to sort by are specified using the sort by parameter. For each attribute, sorting is done in ascending or descending order, depending on the setting of the sorting order parameter. The resulting data set is sorted by the first attribute, then subsets of the same value in the first attribute are sorted by the second attribute etc.

Input

  • example set input (Data Table)

    This input port expects an ExampleSet.

Output

  • example set output (Data Table)

    The sorted ExampleSet is the output of this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed through without changes.

Parameters

  • sort_byThis parameter is used to specify the attributes which should be used for sorting the data set and the associated sorting orders. If multiple attributs are specified, the data set is sorted by the first attribute, then subsets of the same value are sorted by the second attribute, etc. Range: list

Tutorial Processes

Sorting the Golf data set according to Temperature

The 'Golf' data set is loaded using the Retrieve operator. The Sort operator is applied on it. The sort by parameter is used to set the first attribute name parameter to 'Temperature' and the associated sorting order to 'ascending'. Thus the 'Golf' data set is sorted in increasing order of the 'Temperature' attribute. The example with the smallest value of the 'Temperature' attribute becomes the first example and the example with the largest value of the 'Temperature' attribute becomes the last example of the ExampleSet.

Sorting on multiple attributes

This Example Process shows how to sort by multiple attributes. The 'Golf' data set is loaded using the Retrieve operator. On one branch of the Multiply, the Sort operator is applied to it. It's sort by parameter contains two entries: First the attribute name 'Humidity' and then the attribute name 'Temperature', both with the sorting order 'ascending'. On the other branch of the Multiply, two Sort operators are used to sort the ExampleSet on the two attributes: In the first Sort operator, the attribute name parameter inside the sort by parameter is set to 'Temperature' and the sorting order parameter is set to 'ascending'. Then another Sort operator is applied on it. This time the attribute name parameter inside the sort by parameter is set to 'Humidity' and again the sorting order parameter is set to 'ascending'. As you can see in the Results view, both ways to sort yield the same result: The 'Golf' data set is sorted in ascending order of the 'Humidity' attribute. The example with smallest value of the 'Humidity' attribute becomes the first example and the example with the largest value of the 'Humidity' attribute becomes the last example of the ExampleSet. If some examples have the same value of the 'Humidity' attribute, they are sorted using the 'Temperature' attribute. Where examples have same value of the 'Humidity' attribute then the examples with smaller value of the 'Temperature' attribute precede the examples with higher value of the 'Temperature' attribute.