verticapy.machine_learning.memmodel.preprocessing.OneHotEncoder#

class verticapy.machine_learning.memmodel.preprocessing.OneHotEncoder(categories: list | ndarray, column_naming: Literal['indices', 'values', 'values_relaxed'] = 'indices', drop_first: bool = True)#

InMemoryModel implementation of one-hot encoder.

Parameters#

categories: ArrayLike

ArrayLike of the categories of the different features.

column_naming: str, optional

Appends categorical levels to column names according to the specified method:

indices:
Uses integer indices to represent categorical levels.
values | values_relaxed:
Both methods use categorical level names. If duplicate column names occur, the function attempts to disambiguate them by appending _n, where n is a zero-based integer index (_0, _1,…).

drop_first: bool, optional

If set to False, the first dummy of each category is dropped.

Note

OneHotEncoder are defined entirely by their attributes. For example, categories to encode defines a OneHotEncoder model. You can optionally provide column_naming criteria and a drop_first flag to denote whether to drop first dummy of each category.

Attributes#

Attributes are identical to the input parameters, followed by an underscore (‘_’).

Examples#

Initalization

Import the required module.

from verticapy.machine_learning.memmodel.preprocessing import OneHotEncoder

A OneHotEncoder model is defined by categories, column naming criteria and drop_first flag.

Let’s create a OneHotEncoder model.

model_ohe = OneHotEncoder(
    categories = [["male", "female"], [1, 2, 3]],
    drop_first = False,
    column_naming = None,
)

Create a dataset.

data = [["male", 1], ["female", 3]]

Making In-Memory Transformation

Use transform() method to do transformation.

model_ohe.transform(data)
Out[4]: 
array([[1, 0, 1, 0, 0],
       [0, 1, 0, 0, 1]])

Deploy SQL Code

Let’s use the following column names:

cnames = ['sex', 'pclass']

Use transform_sql() method to get the SQL code needed to deploy the model using its attributes.

model_ohe.transform_sql(cnames)
Out[6]: 
[["(CASE WHEN sex = 'male' THEN 1 ELSE 0 END)",
  "(CASE WHEN sex = 'female' THEN 1 ELSE 0 END)"],
 ['(CASE WHEN pclass = 1 THEN 1 ELSE 0 END)',
  '(CASE WHEN pclass = 2 THEN 1 ELSE 0 END)',
  '(CASE WHEN pclass = 3 THEN 1 ELSE 0 END)']]

Hint

This object can be pickled and used in any in-memory environment, just like SKLEARN models.

__init__(categories: list | ndarray, column_naming: Literal['indices', 'values', 'values_relaxed'] = 'indices', drop_first: bool = True) → None#

Methods

`__init__`(categories[, column_naming, drop_first])
`get_attributes`()	Returns the model attributes.
`set_attributes`(**kwargs)	Sets the model attributes.
`transform`(X)	Transforms and applies the `OneHotEncoder` model to the input matrix.
`transform_sql`(X)	Transforms and returns the SQL needed to deploy the `Scaler`.

Attributes

object_type

Must be overridden in child class