Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.2 - Check here for latest version

Equalize Numerical Indices (Time Series)

Synopsis

This operators computes an equalized time series of an input time series with numerical indices.

Description

The output time series will have new equidistant index values. The configuration of the new index values are defined by the parameter equalize method. Each method has different ways how the number of examples, start value, stop value and step size of the new index values are determined. For details see the description of the parameter equalize method.

The corresponding values of the time series attributes will be computed by using the same functionality as the Replace Missing Values (Series) operator (note that this functionality is configured to ensure finite values). The three parameters replace type numerical, replace type nominal and replace type date time defines how the new values are computed.

This operator works on all time series (numerical, nominal, date-time) which have numerical indices.

Input

  • example set (Data Table)

    The ExampleSet which contains the time series data as attributes.

Output

  • equalized example set (Data Table)

    The ExampleSet contains the equalized time series.

  • original (Data Table)

    The ExampleSet that was given as input is passed through without changes.

Parameters

  • indices_attribute

    The attribute holding the indices values of the time series. It has to be numeric. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • sort_time_series

    If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.

    Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.

    The data set provided at the original output port will be the sorted input time series.

    Range:
  • equalize_method

    This parameter defines the used equalize method:

    • same range and number of examples as original data: The same range ('start' and 'stop value') and the same 'number of examples' as the original data is used. The step size is calculated as (<stop value> - <start value>) / (<number of examples> - 1)
    • number of examples, start value and step size: The 'number of examples', the 'start value' and the 'step size' are provided. The number of examples and the start value can be retrieved from the original data or provided as custom values (see the parameters 'number of examples', 'custom number of examples', 'start value', 'custom start value'). The step size has to be provided by the parameter 'step size'. The stop value is calculated as <start value> + (<number of examples> - 1) x <step size>
    • number of examples and range(start,stop): The 'number of examples', the 'start value' and the 'stop value' are provided. The number of examples, the start value and the stop value can be retrieved from the original data or provided as custom values (see the parameters 'number of examples', 'custom number of examples', 'start value', 'custom start value','stop value', 'custom stop value'). The step size is calculated as (<stop value> - <start value>) / (<number of examples> - 1)
    • range(start,stop) and step size: The start value, the stop value and the step size are provided. The start value and the stop value can be retrieved from the original data or provided as custom values (see the parameters 'start value', 'custom start value','stop value', 'custom stop value'). The step size has to be provided by the parameter 'step size'. The number of examples is calculated as Ceil((<stop value> - <start value>) / <step size>) + 1
    Range:
  • number_of_examples

    Specify how the number of examples is retrieved.

    • same as original data: Same value as the original data.
    • custom: The value is specified by the parameter 'custom number of examples'.
    Range:
  • custom_number_of_examples

    New number of examples for the equalized time series

    Range:
  • start_value

    Specify how the start value is retrieved.

    • same as original data: Same value as the original data.
    • custom: The value is specified by the parameter 'custom start value'.
    Range:
  • custom_start_value

    New start value of the index values for the equalized time series.

    Range:
  • stop_value

    Specify how the stop value is retrieved.

    • same as original data: Same value as the original data.
    • custom: The value is specified by the parameter 'custom stop value'.
    Range:
  • custom_stop_value

    New stop value of the index values for the equalized time series.

    Range:
  • step_size_(numerical)

    Step size between the new index values of the equalized time series.

    Range:
  • replace_type_numerical

    The kind of replacement which is used to compute the new numerical values of the equalized time series.

    • previous value: The previous value in the series is used as a replacement. Neighboring missing values are all replaced by the first previous valid value. Missing values at the start of a series are replaced by the next valid value.
    • next value: The next value in the series is used as a replacement. Neighboring missing values are all replaced by the next valid value. Missing values at the end of a series are replaced by the first previous valid value.
    • average: The average of the neighboring values in the series is used as a replacement. Neighboring missing values are all replaced by the average of the neighboring valid values. Missing values at the start and end of a series are replaced by the next, respectively previous valid value.
    • linear interpolation: A linear interpolation (using the old and new index values) between the two neighboring values in the series is used to calculate the replacement value. The next valid neighboring values are used to perform a linear interpolation and all missing values are replaced by the replacement values calculated by the linear interpolation. Missing values at the start and end of a series are replaced by the next, respectively previous valid value.
    • value: All missing values are replaced by a constant value, specified by the replace value numerical parameter.
    Range:
  • replace_type_nominal

    The kind of replacement which is used to compute the new nominal values of the equalized time series.

    • previous value: The previous value in the series is used as a replacement. Neighboring missing values are all replaced by the first previous valid value. Missing values at the start of a series are replaced by the next valid value.
    • next value: The next value in the series is used as a replacement. Neighboring missing values are all replaced by the next valid value. Missing values at the end of a series are replaced by the first previous valid value.
    • value: All missing values are replaced by a constant value, specified by the replace value nominal parameter.
    Range:
  • replace_type_date_time

    The kind of replacement which is used to compute the new date time values of the equalized time series.

    • previous value: The previous value in the series is used as a replacement. Neighboring missing values are all replaced by the first previous valid value. Missing values at the start of a series are replaced by the next valid value.
    • next value: The next value in the series is used as a replacement. Neighboring missing values are all replaced by the next valid value. Missing values at the end of a series are replaced by the first previous valid value.
    • average: The average of the neighboring values in the series is used as a replacement. Neighboring missing values are all replaced by the average of the neighboring valid values. Missing values at the start and end of a series are replaced by the next, respectively previous valid value.
    • linear interpolation: A linear interpolation (using the old and new index values) between the two neighboring values in the series is used to calculate the replacement value. The next valid neighboring values are used to perform a linear interpolation and all missing values are replaced by the replacement values calculated by the linear interpolation. Missing values at the start and end of a series are replaced by the next, respectively previous valid value.
    • value: All missing values are replaced by a constant value, specified by the replace value date time parameter.
    Range:
  • replace_value_numerical

    If replace type numerical is set to value this parameter specifies the replacement value for all missing values of numerical time series.

    Range:
  • replace_value_nominal

    If replace type nominal is set to value this parameter specifies the replacement value for all missing values of nominal time series.

    Range:
  • replace_value_date_time

    If replace type date time is set to value this parameter specifies the replacement value for all missing values of time series with date time values.

    Range:

Tutorial Processes

Equalize a sine function

In this tutorial we demonstrate the usage of the Equalize Numerical Indices operator by equalizing a sine function with non-equidistant numerical indices.

First we generate some sample data with numerical indices which are randomly shifted. The values are a sine function for this non-equidistant indices.

We use the Equalize Numerical Indices with equalize methods = range(start,stop) and step size and retrieve start and stop value from the original data. Hence we will have the same start and stop value, but with equidistant numerical indices with a constant step size of 1.0 (see parameter step size (numerical) ).

Fill gaps in Data Set with integer Ids

In this tutorial we demonstrate the usage of the Equalize Numerical Indices operator to fill gaps in a data set with integer ids as the numerical indices columns. Randomly 20 percent of the ids are removed from the data set and are filled with constant values by the Equalize Numerical Indices operator.