Loading...

verticapy.machine_learning.metrics.confusion_matrix#

verticapy.machine_learning.metrics.confusion_matrix(y_true: str, y_score: str, input_relation: str | vDataFrame, labels: list | ndarray | None = None, pos_label: bool | float | str | timedelta | datetime | None = None) ndarray#

Computes the confusion matrix.

Parameters#

y_true: str

Response column.

y_score: str

Prediction.

input_relation: SQLRelation

Relation used for scoring. This relation can be a view, table, or a customized relation (if an alias is used at the end of the relation). For example: (SELECT … FROM …) x

average: str, optional

The method used to compute the final score for multiclass-classification.

  • binary:

    considers one of the classes as positive and use the binary confusion matrix to compute the score.

  • micro:

    positive and negative values globally.

  • macro:

    average of the score of each class.

  • score:

    scores for all the classes.

  • weighted :

    weighted average of the score of each class.

  • None:

    accuracy.

labels: ArrayLike, optional

List of the response column categories.

pos_label: PythonScalar, optional

Label used to identify the positive class. If pos_label is NULL then the global accuracy is be computed.

Returns#

Array

confusion matrix.

Examples#

We should first import verticapy.

import verticapy as vp

Binary Classification#

Let’s create a small dataset that has:

  • true value

  • predicted value

data = vp.vDataFrame(
    {
        "y_true": [1, 1, 0, 0, 1],
        "y_pred": [1, 1, 1, 0, 1],
    },
)

Next, we import the metric:

from verticapy.machine_learning.metrics import confusion_matrix

Now we can conveniently calculate the score:

confusion_matrix(
    y_true  = "y_true",
    y_score = "y_pred",
    input_relation = data,
)

Out[4]: 
array([[1, 1],
       [0, 3]])

It is also possible to directly compute the score from the vDataFrame:

data.score(
    y_true  = "y_true",
    y_score = "y_pred",
    metric  = "confusion_matrix",
)

Out[5]: 
array([[1, 1],
       [0, 3]])

Note

VerticaPy uses simple SQL queries to compute various metrics. You can use the set_option() function with the sql_on parameter to enable SQL generation and examine the generated queries.

Multi-class Classification#

Let’s create a small dataset that has:

  • true value with more than two classes

  • predicted value

data = vp.vDataFrame(
    {
        "y_true": [1, 2, 0, 0, 1],
        "y_pred": [1, 2, 0, 1, 1],
    },
)

Next, we import the metric:

from verticapy.machine_learning.metrics import confusion_matrix

Now we can conveniently calculate the score:

confusion_matrix(
    y_true  = "y_true",
    y_score = "y_pred",
    labels = [0,1,2],
    input_relation = data,
)

Out[8]: 
array([[1, 1, 0],
       [0, 2, 0],
       [0, 0, 1]])

See also

vDataFrame.score() : Computes the input ML metric.