Loading...

verticapy.vDataFrame.corr#

vDataFrame.corr(columns: str | list[str] | None = None, method: Literal['pearson', 'kendall', 'spearman', 'spearmand', 'biserial', 'cramer'] = 'pearson', mround: int = 3, focus: str | None = None, show: bool = True, chart: PlottingBase | TableSample | Axes | mFigure | Highchart | Highstock | Figure | None = None, **style_kwargs) PlottingBase | TableSample | Axes | mFigure | Highchart | Highstock | Figure#

Calculates the Correlation Matrix for the vDataFrame. This matrix provides insights into how different numerical columns in the dataset are correlated with each other. It helps in understanding the relationships and dependencies between variables, facilitating data analysis and decision-making. The correlation matrix is a valuable tool for identifying patterns, trends, and potential associations within the dataset.

Parameters#

columns: SQLColumns, optional

List of the vDataColumns names. If empty, all numerical vDataColumns are used.

method: str, optional

Method to use to compute the correlation.

  • pearson:

    Pearson’s correlation coefficient (linear).

  • spearman:

    Spearman’s correlation coefficient (monotonic - rank based).

  • spearmanD:

    Spearman’s correlation coefficient using the DENSE RANK function instead of the RANK function.

  • kendall:

    Kendall’s correlation coefficient (similar trends). The method computes the Tau-B coefficient.

    Warning

    This method uses a CROSS JOIN during computation and is therefore computationally expensive at O(n * n), where n is the total count of the vDataFrame.

  • cramer:

    Cramer’s V (correlation between categories).

  • biserial:

    Biserial Point (correlation between binaries and a numericals).

mround: int, optional

Rounds the coefficient using the input number of digits. This is only used to display the correlation matrix.

focus: str, optional

Focus the computation on one vDataColumn.

show: bool, optional

If set to True, the Plotting object is returned.

chart: PlottingObject, optional

The chart object used to plot.

**style_kwargs

Any optional parameter to pass to the plotting functions.

Returns#

obj

Plotting Object.

Examples#

Import VerticaPy.

import verticapy as vp

Import numpy to create a random dataset.

import numpy as np

Generate a dataset using the following data.

N = 30 # Number of records

data = vp.vDataFrame(
    {
        "score1": np.random.normal(5, 1, N),
        "score2": np.random.normal(8, 1.5, N),
        "score3": np.random.normal(10, 2, N),
        "score4": np.random.normal(14, 3, N),
    }
)

Draw the Pearson correlation matrix.

data.corr(method = "pearson")

You can also use the parameter focus to only compute a correlation vector.

data.corr(method = "pearson", focus = "score1")

It is less expensive and it allows you to focus your search on one specific column.

For more examples, please look at the Correlation Matrix page of the Chart Gallery.

See also

vDataFrame.corr_pvalue() : Computes correlation and its p-value.