You are viewing the RapidMiner Studio documentation for version 10.2 - Check here for latest version
Select Attributes (Blending)
Synopsis
This Operator selects a subset of Attributes of an ExampleSet and removes the other Attributes.Description
The Operator provides different filter types to make Attribute selection easy. Possibilities are for example: Direct selection of Attributes. Selection by a regular expression or selecting only Attributes without missing values. See parameter attribute filter type for a detailed description of the different filter types.
The type parameter can be used to decide whether to include or exclude the selected Attributes. Special Attributes (Attributes with Roles, like id, label, weight) are by default ignored in the selection. They will always remain in the resulting output ExampleSet. The parameter also apply to special attributes changes this.
Only the selected Attributes are delivered to the output port. The rest are removed from the ExampleSet.
Differentiation
Select by <...> Operators
There are several Operators that select Attributes according to their input. For example Select by Weights selects Attributes whose weights match a specified criterion. The Select by Random Operator selects a random subset of Attributes. Remove Attribute Range removes a range of Attributes according to the index of the Attributes. The Remove Useless Attributes Operator removes Attributes which can be considered to be useless according to some specified criteria. The Remove Correlated Attributes Operator removes Attributes which are correlated to each other.
Work on Subset
This Operator is a combination of the Select Attributes Operator and the Subprocess Operator. It applies the Operators in its inner process to an ExampleSet with only the Attributes which are selected by the attribute filter type. The inner result is merged back to the whole input ExampleSet.
Forward Selection
This is an implementation of the forward selection feature selection method. It selects the most relevant Attributes according to a model which is trained inside the Operator. For details see the documentation of the Forward Selection Operator.
Backward Elimination
This is an implementation of the backward elimination feature selection method. It selects the most relevant Attributes according to an model which is trained inside the Operator. For details see the documentation of the Forward Selection Operator.
Filter Examples
This Operator does not select Attributes, but filters (or select) Examples. Thus, it does what Select Attributes does but applied to Examples instead of Attributes.
Input
- example set (Data Table)
This input port expects an ExampleSet for which you want to select Attributes from.
Output
- example set (Data Table)
The ExampleSet with only the selected Attributes is delivered to this output port.
- original (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port.
Parameters
- type
This parameter can be used to decide whether to include or exclude the selected Attributes. include attributes ist the default option. It configures the Operator to keep the selected Attributes and remove the remainder. exclude attributes leads to the inverse behaviour. It configures the Operator to remove the selected Attributes and keep the remainder. This also applies to special attributes if the also apply to special attributes parameter is set to true.
Range: - attribute_filter_type
This parameter allows you to select the Attribute selection filter; the method you want to use for selecting Attributes. It has the following options:
- all attributes: This option selects all the Attributes of the ExampleSet, no Attributes are removed. This is the default option
- one attribute: This option allows the selection of a single Attribute. The Attribute is selected by the select attribute parameter.
- a subset: This option allows the selection of multiple Attributes through a list (see parameter select subset). If the meta data of the ExampleSet is known all Attributes are present in the list and the required ones can easily be selected.
- regular expression: This option allows you to specify a regular expression for the Attribute selection. The regular expression filter is configured via the parameters expression and exclude expression.
- type(s) of values: This option allows the selection of Attributes of particular type(s). The value type filter is configured via the parameter type of value.
- no missing values: This option selects all Attributes of the ExampleSet which do not contain a missing value in any Example. Attributes that have even a single missing value are removed.
- select_attribute
The required Attribute can be selected from this option. The Attribute name can be selected from the drop down box of the parameter if the meta data is known. Otherwise, the attribute name can be typed in manually.
Range: - select_subset
The required Attributes can be selected from this option. This opens a new window with two lists. All Attributes are present in the left list, if the meta data is known. They can be shifted to the right list, which is the list of selected Attributes that will make it to the output port. If the meta data is unknown, you can manually type in attribute names and use the green plus-button to add them to the list of selected attributes.
Range: - expression
Attributes whose names match this expression will be selected. The expression can be specified through the button on the right that will open the Edit Regular Expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.
Range: - exclude_expression
This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified via the expression parameter).
Range: - type_of_value
This option allows to select Attribute types. A subset of the following types can be chosen: real, integer, date-time, time, binominal, non-binominal.
Range: - also_apply_to_special_attributes
Special Attributes are Attributes with roles (e.g. id, label..). By default all special Attributes are delivered to the output port regardless of the conditions in the Select Attributes Operator. If this parameter is set to true, special Attributes are also tested against the specified conditions and only those Attributes are selected that match the conditions.
Range:
Tutorial Processes
Selecting Attributes from the Titanic Data Sample
This tutorial Process shows the basic usage of the Select Attributes Operator. First the 'Titanic' data is retrieved from the Samples folder. The first Select Attributes Operator selects a subset of the Attributes. The subset is specified by the select subset parameter.
The original output port is connected to the input port of the second Select Attributes Operator. There, only nominal Attributes are selected by choosing binominal and non-binominal.
Different usages of the Select Attributes Operator
This tutorial Process demonstrates different usages of the Select Attributes Operator. A demo ExampleSet is created inside a Subprocess Operator. It has 3 special Attributes (id, label, weight) and 5 regular Attributes (att1, att2, att3, att4, att5). Also different attribute types are used (integer: id; binominal: label; real: weight, att1, att2, att4, att5; nominal: att3). After the Subprocess Operator a Breakpoint is inserted, to investigate the demo ExampleSet.
Next several Select Attributes Operators are used to show the different attribute filter types and the combinations with the parameters type and also apply to special attributes.
See the comments in the process for more details.
Selecting Attributes by using a regular expression
This tutorial Process illustrates the usage of a regular expression to select Attributes from the Labor-Negotiations data sample. The regular expression specified is: w.*|.*y.*
This means all Attributes starting with a 'w' (w.*) or (|) all Attributes whose names contain a 'y' in their name (.*y.*) match the expression. The following Attributes of the Labor-Negotiations data set match this expression:
wage-inc-1st, wage-inc-2nd, wage-inc-3rd, working-hours, standby-pay, statutory-holidays, longterm-disability-assistance.
The Attributes that match the condition in the exclude expression parameter will be removed. The specified exclude expression is: .*\[0-9\].*. This means all Attributes whose name contains a digit are removed.
Finally the following four Attributes are selected: working-hours, standby-pay, statutory-holidays, longterm-disability-assistance. Beside these, the special Attribute class is also kept.
For more details about regular expression see the configuration of the expression parameter.