verticapy.vDataFrame.fillna#

vDataFrame.fillna(val: dict | None = None, method: dict | None = None, numeric_only: bool = False) → vDataFrame#

Fills missing elements in vDataColumn using specific rules.

Parameters#

val: dict, optional

Dictionary of values. The dictionary must be similar to the following: {"column1": val1 ..., "columnk": valk}. Each key of the dictionary must be a vDataColumn . The missing values of the input vDataColumn are replaced by the input value.

method: dict, optional

Method used to impute the missing values.

auto:
Mean for the numerical and Mode for the categorical vDataColumn.
mean:
Average.
median:
Median.
mode:
Mode (most occurent element).
0ifnull:
0 when the vDataColumn is None, 1 otherwise.

More Methods are available in the vDataColumn.fillna() method.

numeric_only: bool, optional

If parameters ‘val’ and ‘method’ are empty and ‘numeric_only’ is set to True, all numerical vDataColumn are imputed by their average. If set to False, all categorical vDataColumn are also imputed by their mode.

Returns#

vDataFrame: self

Examples#

We import verticapy:

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

For this example, we will use the Titanic dataset.

from verticapy.datasets import load_titanic

data = load_titanic()

Note

VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.

We can see the count of each column to check if any column has missing values.

data.count()

	count
"pclass"	1234.0
"survived"	1234.0
"name"	1234.0
"sex"	1234.0
"age"	997.0
"sibsp"	1234.0
"parch"	1234.0
"ticket"	1234.0
"fare"	1233.0
"cabin"	286.0
"embarked"	1232.0
"boat"	439.0
"body"	118.0
"home.dest"	706.0

From the above table, we can see that the count of boats is less than 1234. This suggests that it is missing some values.

Now we can use the fillna method to fill those values. Let’s use a custom function to fill these values.

data.fillna(
    val = {"boat": "No boat"},
    method = {
        "age": "mean",
        "embarked": "mode",
        "fare": "median",
    }
)

	123 pclass Int 100%	...	123 survived Int 100%	Abc home.dest Varchar(100) 57%
1	1	...	0	Montreal, PQ / Chesterville, ON
2	1	...	0	Montreal, PQ / Chesterville, ON
3	1	...	0	Montreal, PQ / Chesterville, ON
4	1	...	0	Belfast, NI
5	1	...	0	Montevideo, Uruguay
6	1	...	0	New York, NY
7	1	...	0	New York, NY
8	1	...	0	Montreal, PQ
9	1	...	0	Winnipeg, MN
10	1	...	0	San Francisco, CA
11	1	...	0	Trenton, NJ
12	1	...	0	London / Winnipeg, MB
13	1	...	0	Pomeroy, WA
14	1	...	0	Omaha, NE
15	1	...	0	Philadelphia, PA
16	1	...	0	Washington, DC
17	1	...	0	[null]
18	1	...	0	New York, NY
19	1	...	0	Montevideo, Uruguay
20	1	...	0	Montevideo, Uruguay