ONE_HOT_ENCODER_FIT

Generates a sorted list of each of the category levels for each feature to be encoded, and stores the model.

Syntax

ONE_HOT_ENCODER_FIT ( 'model‑name', 'input‑relation','input‑columns' 
                  [ USING PARAMETERS [exclude_columns='excluded‑columns']
                                     [, output_view='output‑view']
                                     [, extra_levels='category‑levels'] ] )

Arguments

model‑name

Identifies the model to create, where model‑name conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.

input‑relation

The table or view that contains the data for one hot encoding. If the input relation is defined in Hive, use SYNC_WITH_HCATALOG_SCHEMA to sync the hcatalog schema, and then run the machine learning function.

input‑columns

Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be INTEGER, BOOLEAN, VARCHAR, or dates.

Parameter Settings

Parameter name Set to…
exclude_columns

Comma-separated list of column names from input‑columns to exclude from processing.

output_view

The name of the view that stores the input relation and the one hot encodings. Columns are returned in the order they appear in the input relation, with the one-hot encoded columns appended after the original columns.

extra_levels

Additional levels in each category that are not in the input relation. This parameter should be passed as a JSON string with category names as keys and lists of extra levels in each category as values.

Quote hyper parameter names and string values according to the JSON standard.

Model Attributes

Attribute Description
call_string The value of all input arguments that were specified at the time the function was called.
varchar_categories

integer_categories

boolean_categories

date_categories

Settings for all:

  • category_name: Column name
  • category_level: Levels of the category, sorted for each category
  • category_level_index: Index of this categorical level in the sorted list of levels for the category.

Privileges

Non-superusers:

  • CREATE privileges on the schema where the model is created
  • SELECT privileges on the input relation
  • CREATE privileges on the output view schema

Examples

=> SELECT ONE_HOT_ENCODER_FIT ('one_hot_encoder_model','mtcars','*' 
USING PARAMETERS exclude_columns='mpg,disp,drat,wt,qsec,vs,am');
ONE_HOT_ENCODER_FIT
--------------------
Success
(1 row)

See Also