verticapy.vDataFrame.add_duplicates#

vDataFrame.add_duplicates(weight: int | str, use_gcd: bool = True) → vDataFrame#

Duplicates the vDataFrame using the input weight.

Parameters#

weight: str | integer: vDataColumn or integer representing the weight.
use_gcd: bool: If set to True, uses the GCD (Greatest Common Divisor) to reduce all common weights to avoid unnecessary duplicates.

Returns#

vDataFrame: the output vDataFrame.

Examples#

Let’s begin by importing VerticaPy.

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

Let us create a vDataFrame with multiple columns:

vdf = vp.vDataFrame(
    {
        "cats": ["A", "B", "C"],
        "reps": [2, 4, 8],
    },
)

	Abc cats Varchar(1) 100%	123 reps Integer 100%
1	A	2
2	B	4
3	C	8

We can add duplicates by the weight column:

vdf.add_duplicates("reps")

	Abc cats Varchar(1) 100%
1	A
2	B
3	C
4	B
5	C
6	C
7	C

Note

VerticaPy will find the greatest common divisor (gcd) of the weight column to normalize the weights by it, ensuring a meaningful minimum number of occurrences. It will then duplicate the different values. This function can be highly valuable in machine learning for preprocessing and increasing the weight of specific rows.