IMPUTE

Imputes missing values in a data set with either the mean or the mode, based on observed values for a variable. This function supports both numeric and categorical data types.

Important: Before using a machine learning function, be aware that all the ongoing transactions might be committed.

Syntax

IMPUTE( 'output_view', 'input_relation', 'input_columns', 'method'
	     [ USING PARAMETERS [exclude_columns='col1, col2, ... coln',]
                               [partition_columns='col1, col2, ... coln'] ]) 

Arguments

output_view

The name of the View where the missing-value imputed rows and rows without missing values are stored.

The value must be VARCHAR.

input_relation

The table or view that contains the data for missing-value imputation. The value must be VARCHAR.

input_columns

A comma-separated list of the columns in input_relation containing the values used in missing value imputation. The value must be VARCHAR.

method

The missing-value imputation method to use.

Valid Values:

  • mean: for numeric missing-value imputation
  • mode: for categorical missing-value imputation

Numeric and categorical input values are supported.

Parameters

exclude_columns=col1, col2, ... coln

(Optional) The columns from input_relation that you want to exclude from the input_columns argument.

Default Value: Empty

partition_columns=col1, col2, ... coln

(Optional) A comma-separated list of column names from input_relation for the partition clause.

Default Value: Empty

Privileges

To use IMPUTE, you must either be a superuser or have CREATE privileges for the schema of the output view and SELECT privileges for the input table or view. There are no privileges needed on the function itself.

See GRANT (Schema) and GRANT (Table).

Examples

These examples show how you can use the IMPUTE function on the small_input_impute table.

Execute the IMPUTE function, specifying the mean method:

=> SELECT impute('output_view','small_input_impute', 'pid, x1,x2,x3,x4','mean' USING PARAMETERS exclude_columns='pid'); impute -------------------------- Finished in 1 iteration (1 row)

 

Execute the IMPUTE function, specifying the mode method:

=> SELECT impute('output_view3','small_input_impute', 'pid, x5,x6','mode' USING PARAMETERS exclude_columns='pid'); impute -------------------------- Finished in 1 iteration (1 row)

See Also

Imputing Missing Values