verticapy.vDataFrame.corr_pvalue#
- vDataFrame.corr_pvalue(column1: str, column2: str, method: Literal['pearson', 'kendall', 'kendalla', 'kendallb', 'kendallc', 'spearman', 'spearmand', 'biserial', 'cramer'] = 'pearson') tuple[float, float] #
Computes the Correlation Coefficient between two input vDataColumns, along with its associated p-value. This calculation helps assess the strength and direction of the relationship between the two columns and provides statistical significance through the p-value.
Parameters#
- column1: str
Input vDataColumn.
- column2: str
Input vDataColumn.
- method: str, optional
Method to use to compute the correlation.
- pearson:
Pearson’s correlation coefficient (linear).
- spearman:
Spearman’s correlation coefficient (monotonic - rank based).
- spearmanD:
Spearman’s correlation coefficient using the DENSE RANK function instead of the RANK function.
- kendall:
Kendall’s correlation coefficient (similar trends). The method computes the Tau-B coefficient.
Warning
This method uses a CROSS JOIN during computation and is therefore computationally expensive at O(n * n), where n is the total count of the
vDataFrame
.
- cramer:
Cramer’s V (correlation between categories).
- biserial:
Biserial Point (correlation between binaries and a numericals).
Returns#
- tuple
(Correlation Coefficient, pvalue)
Examples#
For this example, let’s generate a dataset and compute the Pearson correlation coefficient and its p-value between the two features: ‘x’ and ‘y’.
import verticapy as vp data = vp.vDataFrame( { "x": [1, 2, 4, 9, 10, 15, 20, 22], "y": [1, 2, 1, 2, 1, 1, 2, 1], "z": [10, 12, 2, 1, 9, 8, 1, 3], } ) data.corr_pvalue( column1 = "x", column2 = "y", method = "pearson", ) Out[3]: (-0.00430391263628055, 0.9919302634617095)
See also
vDataFrame.
corr()
: Computes the correlation matrix.