verticapy.machine_learning.metrics.regression_report#

verticapy.machine_learning.metrics.regression_report(y_true: str, y_score: str, input_relation: str | vDataFrame, metrics: None | str | list[str] = None, k: int = 1, genSQL: bool = False) → float | TableSample#

Computes a regression report using multiple metrics to evaluate the model (r2, mse, max error…).

Parameters#

y_true: str

Response column.

y_score: str

Prediction.

input_relation: SQLRelation

Relation to use for scoring. This relation can be a view, table, or a customized relation (if an alias is used at the end of the relation). For example: (SELECT … FROM …) x

metrics: list, optional

List of the metrics used to compute the final report.

aic:
Akaike’s Information Criterion

\[AIC = 2k - 2\ln(\hat{L})\]
bic:
Bayesian Information Criterion

\[BIC = -2\ln(\hat{L}) + k \ln(n)\]
max:
Max Error.

\[ME = \max_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]
mae:
Mean Absolute Error.

\[MAE = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]
median:
Median Absolute Error.

\[MedAE = \text{median}_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]
mse:
Mean Squared Error.

\[MsE = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2\]
msle:
Mean Squared Log Error.

\[MSLE = \frac{1}{n} \sum_{i=1}^{n} (\log(1 + y_i) - \log(1 + \hat{y}_i))^2\]
r2:
R squared coefficient.

\[R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}\]
r2a:
R2 adjusted

\[\text{Adjusted } R^2 = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1}\]
qe:
quantile error, the quantile must be included in the name. Example: qe50.1% will return the quantile error using q=0.501.
rmse:
Root-mean-squared error

\[RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}\]
var:
Explained Variance

\[\text{Explained Variance} = 1 - \frac{Var(y - \hat{y})}{Var(y)}\]

k: int, optional

Number of predictors. Used to compute the adjusted R2

genSQL: bool, optional

If set to True, returns the sql that is used to generate the metrics.

Returns#

TableSample: report.

Examples#

We should first import verticapy.

import verticapy as vp

Let’s create a small dataset that has:

true value
predicted value

data = vp.vDataFrame(
    {
        "y_true": [1, 1.5, 3, 2, 5],
        "y_pred": [1.1, 1.55, 2.9, 2.01, 4.5],
    }
)

Next, we import the metric:

from verticapy.machine_learning.metrics import regression_report

Now we can conveniently compute the report:

regression_report(
    y_true  = "y_true",
    y_score = "y_pred",
    input_relation = data,
)

Out[4]: 
None                                      value  
explained_variance                     0.976612  
max_error                                   0.5  
median_absolute_error                       0.1  
mean_absolute_error                       0.152  
mean_squared_error                      0.05452  
root_mean_squared_error        0.23349518196314  
r2                                      0.97274  
r2_adj                        0.963653333333333  
aic                          -0.545938360769027  
bic                           -11.3270625359008  
Rows: 1-10 | Columns: 2

Note

VerticaPy uses simple SQL queries to compute various metrics. You can use the set_option() function with the sql_on parameter to enable SQL generation and examine the generated queries.