verticapy.machine_learning.metrics.confusion_matrix#

Computes the confusion matrix.

Parameters#

y_true: str

Response column.

y_score: str

Prediction.

input_relation: SQLRelation

Relation used for scoring. This relation can be a view, table, or a customized relation (if an alias is used at the end of the relation). For example: (SELECT … FROM …) x

average: str, optional

The method used to compute the final score for multiclass-classification.

binary:
considers one of the classes as positive and use the binary confusion matrix to compute the score.
micro:
positive and negative values globally.
macro:
average of the score of each class.
score:
scores for all the classes.
weighted :
weighted average of the score of each class.
None:
accuracy.

labels: ArrayLike, optional

List of the response column categories.

pos_label: PythonScalar, optional

Label used to identify the positive class. If pos_label is NULL then the global accuracy is be computed.

Returns#

Array: confusion matrix.

Examples#

We should first import verticapy.

import verticapy as vp

Binary Classification#

Let’s create a small dataset that has:

true value
predicted value

data = vp.vDataFrame(
    {
        "y_true": [1, 1, 0, 0, 1],
        "y_pred": [1, 1, 1, 0, 1],
    },
)

Next, we import the metric:

from verticapy.machine_learning.metrics import confusion_matrix

Now we can conveniently calculate the score:

confusion_matrix(
    y_true  = "y_true",
    y_score = "y_pred",
    input_relation = data,
)

Out[4]: 
array([[1, 1],
       [0, 3]])

It is also possible to directly compute the score from the vDataFrame:

data.score(
    y_true  = "y_true",
    y_score = "y_pred",
    metric  = "confusion_matrix",
)

Out[5]: 
array([[1, 1],
       [0, 3]])

Note

VerticaPy uses simple SQL queries to compute various metrics. You can use the set_option() function with the sql_on parameter to enable SQL generation and examine the generated queries.

Multi-class Classification#

Let’s create a small dataset that has:

true value with more than two classes
predicted value

data = vp.vDataFrame(
    {
        "y_true": [1, 2, 0, 0, 1],
        "y_pred": [1, 2, 0, 1, 1],
    },
)

Next, we import the metric:

from verticapy.machine_learning.metrics import confusion_matrix

Now we can conveniently calculate the score:

confusion_matrix(
    y_true  = "y_true",
    y_score = "y_pred",
    labels = [0,1,2],
    input_relation = data,
)

Out[8]: 
array([[1, 1, 0],
       [0, 2, 0],
       [0, 0, 1]])