verticapy.vDataFrame.dropna#

vDataFrame.dropna(columns: str | list[str] | None = None) → vDataFrame#

Filters the specified vDataColumns in a vDataFrame for missing values.

Parameters#

columns: SQLColumns, optional: List of the vDataColumns names. If empty, all vDataColumns are selected.

Returns#

vDataFrame: self

Examples#

We import verticapy:

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

For this example, we will use the Titanic dataset:

from verticapy.datasets import load_titanic

vdf = load_titanic()

In the above dataset, notice that the first and last entries are identical i.e. duplicates.

Note

VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.

We can see the count of each column to check if any column has missing values.

vdf.count()

	count
"pclass"	1234.0
"survived"	1234.0
"name"	1234.0
"sex"	1234.0
"age"	997.0
"sibsp"	1234.0
"parch"	1234.0
"ticket"	1234.0
"fare"	1233.0
"cabin"	286.0
"embarked"	1232.0
"boat"	439.0
"body"	118.0
"home.dest"	706.0

From the above table, we can see that there are multiple columns with missing/NA values.

Using dropna, we can select which columns do we want the dataset to filter by:

vdf.dropna(columns = ["fare", "embarked", "age"])
Out[4]: 
None  pclass    ...    survived                           home.dest  
        1    ...           0     Montreal, PQ / Chesterville, ON  
        1    ...           0     Montreal, PQ / Chesterville, ON  
        1    ...           0     Montreal, PQ / Chesterville, ON  
        1    ...           0                         Belfast, NI  
        1    ...           0                 Montevideo, Uruguay  
        1    ...           0                        New York, NY  
        1    ...           0                        Montreal, PQ  
        1    ...           0                        Winnipeg, MN  
        1    ...           0                   San Francisco, CA  
       1    ...           0                         Trenton, NJ  
       1    ...           0               London / Winnipeg, MB  
       1    ...           0                         Pomeroy, WA  
       1    ...           0                           Omaha, NE  
       1    ...           0                      Washington, DC  
       1    ...           0                        New York, NY  
       1    ...           0                 Montevideo, Uruguay  
       1    ...           0                 Montevideo, Uruguay  
       1    ...           0    Ascot, Berkshire / Rochester, NY  
       1    ...           0             Little Onn Hall, Staffs  
       1    ...           0                          Amenia, ND  
...      ...    ...         ...                                 ...  
Rows: 1-20 of 994 | Columns: 3

	123 pclass Int 100%	...	123 survived Int 100%	Abc home.dest Varchar(100) 65%
1	1	...	0	Montreal, PQ / Chesterville, ON
2	1	...	0	Montreal, PQ / Chesterville, ON
3	1	...	0	Montreal, PQ / Chesterville, ON
4	1	...	0	Belfast, NI
5	1	...	0	Montevideo, Uruguay
6	1	...	0	New York, NY
7	1	...	0	Montreal, PQ
8	1	...	0	Winnipeg, MN
9	1	...	0	San Francisco, CA
10	1	...	0	Trenton, NJ
11	1	...	0	London / Winnipeg, MB
12	1	...	0	Pomeroy, WA
13	1	...	0	Omaha, NE
14	1	...	0	Washington, DC
15	1	...	0	New York, NY
16	1	...	0	Montevideo, Uruguay
17	1	...	0	Montevideo, Uruguay
18	1	...	0	Ascot, Berkshire / Rochester, NY
19	1	...	0	Little Onn Hall, Staffs
20	1	...	0	Amenia, ND

Now again, if we look at the count, we will notice that the total count has decreased.

vdf.count()

	count
"pclass"	994.0
"survived"	994.0
"name"	994.0
"sex"	994.0
"age"	994.0
"sibsp"	994.0
"parch"	994.0
"ticket"	994.0
"fare"	994.0
"cabin"	261.0
"embarked"	994.0
"boat"	380.0
"body"	116.0
"home.dest"	648.0