Loading...

verticapy.vDataFrame.corr_pvalue#

vDataFrame.corr_pvalue(column1: str, column2: str, method: Literal['pearson', 'kendall', 'kendalla', 'kendallb', 'kendallc', 'spearman', 'spearmand', 'biserial', 'cramer'] = 'pearson') tuple[float, float]#

Computes the Correlation Coefficient between two input vDataColumns, along with its associated p-value. This calculation helps assess the strength and direction of the relationship between the two columns and provides statistical significance through the p-value.

Parameters#

column1: str

Input vDataColumn.

column2: str

Input vDataColumn.

method: str, optional

Method to use to compute the correlation.

  • pearson:

    Pearson’s correlation coefficient (linear).

  • spearman:

    Spearman’s correlation coefficient (monotonic - rank based).

  • spearmanD:

    Spearman’s correlation coefficient using the DENSE RANK function instead of the RANK function.

  • kendall:

    Kendall’s correlation coefficient (similar trends). The method computes the Tau-B coefficient.

    Warning

    This method uses a CROSS JOIN during computation and is therefore computationally expensive at O(n * n), where n is the total count of the vDataFrame.

  • cramer:

    Cramer’s V (correlation between categories).

  • biserial:

    Biserial Point (correlation between binaries and a numericals).

Returns#

tuple

(Correlation Coefficient, pvalue)

Examples#

For this example, let’s generate a dataset and compute the Pearson correlation coefficient and its p-value between the two features: ‘x’ and ‘y’.

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)


data.corr_pvalue(
    column1 = "x",
    column2 = "y",
    method = "pearson",
)

Out[3]: (-0.00430391263628055, 0.9919302634617095)

See also

vDataFrame.corr() : Computes the correlation matrix.