verticapy.vDataFrame.drop_duplicates#

vDataFrame.drop_duplicates(columns: str | list[str] | None = None) → vDataFrame#

Filters the duplicates using a partition by the input vDataColumns.

Warning

Dropping duplicates will make the vDataFrame structure heavier. It is recommended that you check the current structure using the current_relation method and save it using the to_db method, using the parameters inplace = True and relation_type = table.

Parameters#

columns: SQLColumns, optional: List of the vDataColumns names. If empty, all vDataColumns are selected.

Returns#

vDataFrame: self

Examples#

We import verticapy:

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

For this example, we will use a dummy dataset with three columns:

vdf = vp.vDataFrame(
    {
        "col1": [1, 2, 3, 1],
        "col2": [3, 3, 1, 3],
        "col":['a', 'b', 'v', 'a'],
    }
)

	123 col1 Integer 100%	...	123 col2 Integer 100%	Abc col Varchar(1) 100%
1	1	...	3	a
2	2	...	3	b
3	3	...	1	v
4	1	...	3	a

In the above dataset, notice that the first and last entries are identical i.e. duplicates.

Note

VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.

Using drop_duplicates we can take out any duplicates:

vdf.drop_duplicates()

	123 col1 Integer 100%	...	123 col2 Integer 100%	Abc col Varchar(1) 100%
1	1	...	3	a
2	2	...	3	b
3	3	...	1	v