vDataFrame[].fillna¶
In [ ]:
vDataFrame[].fillna(val = None,
method: str = "auto",
expr: str = "",
by: list = [],
order_by: list = [])
Fills the vcolumn missing elements using specific rules.
Parameters¶
| Name | Type | Optional | Description |
|---|---|---|---|
val | int / float / str | ✓ | Value to use to impute the vcolumn. |
method | dict | ✓ | Method to use to impute the missing values.
|
expr | str | ✓ | SQL expression. |
by | list | ✓ | vcolumns used in the partition. |
order_by | list | ✓ | List of the vcolumns to use to sort the data when using TS methods. |
In [72]:
from verticapy.datasets import load_titanic
titanic = load_titanic()
display(titanic)
In [73]:
display(titanic["age"])
# Imputation using the avg over partition by pclass and sex
titanic["age"].fillna(method = "avg", by = ["pclass", "sex"])
Out[73]:
In [74]:
display(titanic["age"])
# Imputation using the median over partition by pclass and sex
titanic["age"].fillna(method = "median", by = ["pclass", "sex"])
Out[74]:
In [75]:
display(titanic["embarked"])
# Imputation using the mode
titanic["embarked"].fillna(method = "mode")
Out[75]:
In [76]:
display(titanic["boat"])
# As the missing values are the most occurent categories, an idea
# could be to merge all the non-null category (category 1) and to
# impute the missing values par 0. '0ifnull' method is following this
# process
titanic["boat"].fillna(method = "0ifnull")
Out[76]:
See Also¶
| vDataFrame[].dropna | Drops the vcolumn missing values. |
