Loading...

verticapy.machine_learning.metrics.regression_report#

verticapy.machine_learning.metrics.regression_report(y_true: str, y_score: str, input_relation: str | vDataFrame, metrics: None | str | list[str] = None, k: int = 1, genSQL: bool = False) float | TableSample#

Computes a regression report using multiple metrics to evaluate the model (r2, mse, max error…).

Parameters#

y_true: str

Response column.

y_score: str

Prediction.

input_relation: SQLRelation

Relation to use for scoring. This relation can be a view, table, or a customized relation (if an alias is used at the end of the relation). For example: (SELECT … FROM …) x

metrics: list, optional

List of the metrics used to compute the final report.

  • aic:

    Akaike’s Information Criterion

    \[AIC = 2k - 2\ln(\hat{L})\]
  • bic:

    Bayesian Information Criterion

    \[BIC = -2\ln(\hat{L}) + k \ln(n)\]
  • max:

    Max Error.

    \[ME = \max_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]
  • mae:

    Mean Absolute Error.

    \[MAE = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]
  • median:

    Median Absolute Error.

    \[MedAE = \text{median}_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]
  • mse:

    Mean Squared Error.

    \[MsE = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2\]
  • msle:

    Mean Squared Log Error.

    \[MSLE = \frac{1}{n} \sum_{i=1}^{n} (\log(1 + y_i) - \log(1 + \hat{y}_i))^2\]
  • r2:

    R squared coefficient.

    \[R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}\]
  • r2a:

    R2 adjusted

    \[\text{Adjusted } R^2 = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1}\]
  • qe:

    quantile error, the quantile must be included in the name. Example: qe50.1% will return the quantile error using q=0.501.

  • rmse:

    Root-mean-squared error

    \[RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}\]
  • var:

    Explained Variance

    \[\text{Explained Variance} = 1 - \frac{Var(y - \hat{y})}{Var(y)}\]
k: int, optional

Number of predictors. Used to compute the adjusted R2

genSQL: bool, optional

If set to True, returns the sql that is used to generate the metrics.

Returns#

TableSample

report.

Examples#

We should first import verticapy.

import verticapy as vp

Let’s create a small dataset that has:

  • true value

  • predicted value

data = vp.vDataFrame(
    {
        "y_true": [1, 1.5, 3, 2, 5],
        "y_pred": [1.1, 1.55, 2.9, 2.01, 4.5],
    }
)

Next, we import the metric:

from verticapy.machine_learning.metrics import regression_report

Now we can conveniently compute the report:

regression_report(
    y_true  = "y_true",
    y_score = "y_pred",
    input_relation = data,
)

Out[4]: 
None                                      value  
explained_variance                     0.976612  
max_error                                   0.5  
median_absolute_error                       0.1  
mean_absolute_error                       0.152  
mean_squared_error                      0.05452  
root_mean_squared_error        0.23349518196314  
r2                                      0.97274  
r2_adj                        0.963653333333333  
aic                          -0.545938360769027  
bic                           -11.3270625359008  
Rows: 1-10 | Columns: 2

Note

VerticaPy uses simple SQL queries to compute various metrics. You can use the set_option() function with the sql_on parameter to enable SQL generation and examine the generated queries.

See also

vDataFrame.score() : Computes the input ML metric.