verticapy.machine_learning.memmodel.preprocessing.OneHotEncoder#
- class verticapy.machine_learning.memmodel.preprocessing.OneHotEncoder(categories: list | ndarray, column_naming: Literal['indices', 'values', 'values_relaxed'] = 'indices', drop_first: bool = True)#
InMemoryModel
implementation of one-hot encoder.Parameters#
- categories: ArrayLike
ArrayLike of the categories of the different features.
- column_naming: str, optional
Appends categorical levels to column names according to the specified method:
- indices:
Uses integer indices to represent categorical levels.
- values | values_relaxed:
Both methods use categorical level names. If duplicate column names occur, the function attempts to disambiguate them by appending _n, where n is a zero-based integer index (_0, _1,…).
- drop_first: bool, optional
If set to
False
, the first dummy of each category is dropped.
Note
OneHotEncoder
are defined entirely by their attributes. For example,categories
to encode defines aOneHotEncoder
model. You can optionally providecolumn_naming
criteria and adrop_first
flag to denote whether to drop first dummy of each category.Attributes#
Attributes are identical to the input parameters, followed by an underscore (‘_’).
Examples#
Initalization
Import the required module.
from verticapy.machine_learning.memmodel.preprocessing import OneHotEncoder
A OneHotEncoder model is defined by categories, column naming criteria and
drop_first
flag.Let’s create a
OneHotEncoder
model.model_ohe = OneHotEncoder( categories = [["male", "female"], [1, 2, 3]], drop_first = False, column_naming = None, )
Create a dataset.
data = [["male", 1], ["female", 3]]
Making In-Memory Transformation
Use
transform()
method to do transformation.model_ohe.transform(data) Out[4]: array([[1, 0, 1, 0, 0], [0, 1, 0, 0, 1]])
Deploy SQL Code
Let’s use the following column names:
cnames = ['sex', 'pclass']
Use
transform_sql()
method to get the SQL code needed to deploy the model using its attributes.model_ohe.transform_sql(cnames) Out[6]: [["(CASE WHEN sex = 'male' THEN 1 ELSE 0 END)", "(CASE WHEN sex = 'female' THEN 1 ELSE 0 END)"], ['(CASE WHEN pclass = 1 THEN 1 ELSE 0 END)', '(CASE WHEN pclass = 2 THEN 1 ELSE 0 END)', '(CASE WHEN pclass = 3 THEN 1 ELSE 0 END)']]
Hint
This object can be pickled and used in any in-memory environment, just like SKLEARN models.
- __init__(categories: list | ndarray, column_naming: Literal['indices', 'values', 'values_relaxed'] = 'indices', drop_first: bool = True) None #
Methods
__init__
(categories[, column_naming, drop_first])Returns the model attributes.
set_attributes
(**kwargs)Sets the model attributes.
transform
(X)Transforms and applies the
OneHotEncoder
model to the input matrix.Transforms and returns the SQL needed to deploy the
Scaler
.Attributes
Must be overridden in child class