verticapy.vDataFrame.corr_pvalue#

vDataFrame.corr_pvalue(column1: str, column2: str, method: Literal['pearson', 'kendall', 'kendalla', 'kendallb', 'kendallc', 'spearman', 'spearmand', 'biserial', 'cramer'] = 'pearson') → tuple[float, float]#

Computes the Correlation Coefficient between two input vDataColumns, along with its associated p-value. This calculation helps assess the strength and direction of the relationship between the two columns and provides statistical significance through the p-value.

Parameters#

column1: str

Input vDataColumn.

column2: str

Input vDataColumn.

method: str, optional

Method to use to compute the correlation.

pearson:
Pearson’s correlation coefficient (linear).

spearman:
Spearman’s correlation coefficient (monotonic - rank based).

spearmanD:
Spearman’s correlation coefficient using the DENSE RANK function instead of the RANK function.

kendall:
Kendall’s correlation coefficient (similar trends). The method computes the Tau-B coefficient.

Warning

This method uses a CROSS JOIN during computation and is therefore computationally expensive at O(n * n), where n is the total count of the vDataFrame.

cramer:
Cramer’s V (correlation between categories).

biserial:
Biserial Point (correlation between binaries and a numericals).

Returns#

tuple: (Correlation Coefficient, pvalue)

Examples#

For this example, let’s generate a dataset and compute the Pearson correlation coefficient and its p-value between the two features: ‘x’ and ‘y’.

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)


data.corr_pvalue(
    column1 = "x",
    column2 = "y",
    method = "pearson",
)

Out[3]: (-0.00430391263628055, 0.9919302634617095)