verticapy.vDataFrame.corr#
- vDataFrame.corr(columns: str | list[str] | None = None, method: Literal['pearson', 'kendall', 'spearman', 'spearmand', 'biserial', 'cramer'] = 'pearson', mround: int = 3, focus: str | None = None, show: bool = True, chart: PlottingBase | TableSample | Axes | mFigure | Highchart | Highstock | Figure | None = None, **style_kwargs) PlottingBase | TableSample | Axes | mFigure | Highchart | Highstock | Figure #
Calculates the Correlation Matrix for the vDataFrame. This matrix provides insights into how different numerical columns in the dataset are correlated with each other. It helps in understanding the relationships and dependencies between variables, facilitating data analysis and decision-making. The correlation matrix is a valuable tool for identifying patterns, trends, and potential associations within the dataset.
Parameters#
- columns: SQLColumns, optional
List of the vDataColumns names. If empty, all numerical vDataColumns are used.
- method: str, optional
Method to use to compute the correlation.
- pearson:
Pearson’s correlation coefficient (linear).
- spearman:
Spearman’s correlation coefficient (monotonic - rank based).
- spearmanD:
Spearman’s correlation coefficient using the DENSE RANK function instead of the RANK function.
- kendall:
Kendall’s correlation coefficient (similar trends). The method computes the Tau-B coefficient.
Warning
This method uses a CROSS JOIN during computation and is therefore computationally expensive at O(n * n), where n is the total count of the
vDataFrame
.
- cramer:
Cramer’s V (correlation between categories).
- biserial:
Biserial Point (correlation between binaries and a numericals).
- mround: int, optional
Rounds the coefficient using the input number of digits. This is only used to display the correlation matrix.
- focus: str, optional
Focus the computation on one vDataColumn.
- show: bool, optional
If set to True, the Plotting object is returned.
- chart: PlottingObject, optional
The chart object used to plot.
- **style_kwargs
Any optional parameter to pass to the plotting functions.
Returns#
- obj
Plotting Object.
Examples#
Import VerticaPy.
import verticapy as vp
Import numpy to create a random dataset.
import numpy as np
Generate a dataset using the following data.
N = 30 # Number of records data = vp.vDataFrame( { "score1": np.random.normal(5, 1, N), "score2": np.random.normal(8, 1.5, N), "score3": np.random.normal(10, 2, N), "score4": np.random.normal(14, 3, N), } )
Draw the Pearson correlation matrix.
data.corr(method = "pearson")
You can also use the parameter focus to only compute a correlation vector.
data.corr(method = "pearson", focus = "score1")
It is less expensive and it allows you to focus your search on one specific column.
For more examples, please look at the Correlation Matrix page of the Chart Gallery.
See also
vDataFrame.
corr_pvalue()
: Computes correlation and its p-value.