APPLY_ONE_HOT_ENCODER

A user-defined transform function (UDTF) that loads the one hot encoder model and writes out a table that contains the encoded columns.

Syntax

APPLY_ONE_HOT_ENCODER( input‑columns 
        USING PARAMETERS model_name = 'model‑name'
            [, drop_first = 'is‑first']
            [, ignore_null = 'ignore']
            [, separator = 'separator‑character']
            [, column_naming = 'name‑output']
            [, null_column_name = 'null‑column‑name'] )

Arguments

input‑columns Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns.

Parameters

model_name Name of the model (case-insensitive). , stores the categories and their corresponding levels.
drop_first

Boolean value, one of the following:

  • true (default): Treat the first level of the categorical variable as the reference level.
  • false: Every level of the categorical variable has a corresponding column in the output view
ignore_null

Boolean value, one of the following:

  • true (default): Null values set all corresponding one-hot binary columns to null.
  • false: Null values in input‑columns are treated as a categorical level
separator

The character that separates the input variable name and the indicator variable level in the output table.To avoid using any separator, set this parameter to null value.

Default: Underscore (_)

column_naming

Appends categorical levels to column names according to the specified method:

  • indices (default): Uses integer indices to represent categorical levels.
  • values/values_relaxed: Both methods use categorical level names. If duplicate column names occur, the function attempts to disambiguate them by appending _n, where n is a zero-based integer index (_0, _1,...).

    If the function cannot produce unique column names , it handles this according to the chosen method:

    • values returns an error.
    • values_relaxed reverts to using indices.

The following column naming rules apply if column_naming is set to values or values_relaxed:

  • Input column names with more than 128 characters are truncated.
  • Column names can contain special characters.
  • If parameter ignore_null is set to true, APPLY_ONE_HOT_ENCODER constructs the column name from the value set in parameter null_column_name. If this parameter is omitted, the string null is used.
null_column_name

The string used in naming the indicator column for null values, used only if ignore_null is set to false and column_naming is set to values or values_relaxed.

Default: null

Note: If an input row contains a level not stored in the model, the output row columns corresponding to that categorical level are returned as null values.

Examples

=> SELECT APPLY_ONE_HOT_ENCODER(cyl USING PARAMETERS model_name='one_hot_encoder_model', 
drop_first='true', ignore_null='false') FROM mtcars;
cyl | cyl_1 | cyl_2
----+-------+-------
8   |     0 |     1
4   |     0 |     0
4   |     0 |     0
8   |     0 |     1
8   |     0 |     1
8   |     0 |     1
4   |     0 |     0
8   |     0 |     1
8   |     0 |     1
4   |     0 |     0
8   |     0 |     1
6   |     1 |     0
4   |     0 |     0
4   |     0 |     0
6   |     1 |     0
6   |     1 |     0
8   |     0 |     1
8   |     0 |     1
4   |     0 |     0
4   |     0 |     0
6   |     1 |     0
8   |     0 |     1
8   |     0 |     1
6   |     1 |     0
4   |     0 |     0
8   |     0 |     1
8   |     0 |     1
8   |     0 |     1
6   |     1 |     0
6   |     1 |     0
4   |     0 |     0
4   |     0 |     0
(32 rows)

See Also