ML algorithm tolerance compared with other open-source libraries#

At VerticaPy, we pride ourselves on delivering robust and reliable machine learning algorithms. As part of our commitment to quality assurance, we conduct rigorous unit tests to compare the performance of our algorithms with those from other popular open-source libraries, such as scikit-learn. These tests ensure that our implementations maintain parity with industry standards and deliver consistent results across various datasets. To provide transparency and facilitate easy reference, we establish specific tolerances for each type of algorithm. These tolerances serve as benchmarks for evaluating the accuracy and efficiency of our algorithms, guaranteeing that they meet or exceed expectations across diverse machine learning tasks.

In addition to our algorithms, we also establish tolerances for all the different metrics we use to measure the accuracy of our models. The tolerances for all metrics are:

Model Type	Relative Error
Regression	1%
Classification	10%

Below we have created a table that lists some of the different tolerances for several models:

Important

All our models are tested against popular libraries, but only a few are included in the below table.

Note

The formula for relative error:

\[relative\ error = \frac{{|Score_{verticapy} - Score_{python}|}}{{\min(|Score_{verticapy}|, |Score_{python}|)}}\]

The formula for absolute error:

\[relative\ error = \frac{{|Score_{verticapy} - Score_{python}|}}{{1 + \min(|Score_{verticapy}|, |Score_{python}|)}}\]

Absolute error is only used in cases where the result is very close to 0.

For more information on pytest approximations, see the pytest documentation

Regression#

DecisionTreeRegressor#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	0.7	1e-12
F-statistic	0.03	1e-12
Jarque-Bera (JB)	0.006	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	0.7	1e-12
Skewness	0.0003	1e-12
aic	0.3	1e-12
bic	0.4	1e-12
coef_	1e-06	1e-12
df	1e-06	1e-12
explained_variance	0.7	1e-12
f	0.03	1e-12
intercept_	1e-06	1e-12
load_model	3e-16	1e-12
max_error	0.09	1e-12
mean_absolute_error	0.2	1e-12
mean_squared_error	0.2	1e-12
mean_squared_log_error	0.2	1e-12
median_absolute_error	0.09	1e-12
ms	0.03	1e-12
p_value	1e-06	1e-12
predict	3e-16	1e-12
r2	0.7	1e-12
r2_adj	0.7	1e-12
rmse	0.2	1e-12
score	0.04	1e-12
ss	0.03	1e-12
to_python	1e-06	1e-12

DummyTreeRegressor#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	1e-06	1e-12
F-statistic	1e-06	1e-12
Jarque-Bera (JB)	0.006	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	1e-06	1e-12
Skewness	0.0003	1e-12
aic	1e-05	1e-12
bic	1e-06	1e-12
coef_	1e-06	1e-12
df	1e-06	1e-12
explained_variance	1e-06	1e-12
f	1e-06	1e-12
intercept_	1e-06	1e-12
load_model	0.1	1e-12
max_error	0.001	1e-12
mean_absolute_error	1e-05	1e-12
mean_squared_error	1e-06	1e-12
mean_squared_log_error	0.0001	1e-12
median_absolute_error	0.001	1e-12
ms	0.001	1e-12
p_value	1e-06	1e-12
predict	0.1	1e-12
r2	1e-06	1e-12
r2_adj	1e-06	1e-12
rmse	1e-06	1e-12
score	1e-06	1e-12
ss	1e-06	1e-12
to_python	1e-06	1e-12

ElasticNet#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	1e-06	1e-12
F-statistic	1e-06	1e-09
Jarque-Bera (JB)	0.008	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	1e-06	1e-12
Skewness	0.0005	1e-12
aic	7e-06	1e-12
bic	1e-06	1e-12
coef_	1e-06	1e-12
df	1e-06	1e-12
explained_variance	1e-06	1e-12
f	1e-06	1e-12
intercept_	1e-06	1e-12
load_model	1e-06	1e-12
max_error	1e-06	1e-12
mean_absolute_error	1e-06	1e-12
mean_squared_error	1e-06	1e-12
mean_squared_log_error	1e-06	1e-12
median_absolute_error	1e-06	1e-12
ms	0.0007	1e-12
p_value	1e-06	1e-12
predict	1e-06	1e-12
r2	1e-06	1e-12
r2_adj	1e-06	1e-12
rmse	1e-06	1e-12
score	1e-06	1e-12
ss	1e-06	1e-12
to_python	1e-06	1e-12

Lasso#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	1e-06	1e-12
F-statistic	1e-06	1e-09
Jarque-Bera (JB)	0.008	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	1e-06	1e-12
Skewness	0.0005	1e-12
aic	7e-06	1e-12
bic	1e-06	1e-12
coef_	1e-06	1e-12
df	1e-06	1e-12
explained_variance	1e-06	1e-12
f	1e-06	1e-12
intercept_	1e-06	1e-12
load_model	1e-06	1e-12
max_error	1e-06	1e-12
mean_absolute_error	1e-06	1e-12
mean_squared_error	1e-06	1e-12
mean_squared_log_error	1e-06	1e-12
median_absolute_error	1e-06	1e-12
ms	0.0007	1e-12
p_value	1e-06	1e-12
predict	1e-06	1e-12
r2	1e-06	1e-12
r2_adj	1e-06	1e-12
rmse	1e-06	1e-12
score	1e-06	1e-12
ss	1e-06	1e-12
to_python	1e-06	1e-12

LinearRegression#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	1e-06	1e-12
F-statistic	1e-06	1e-12
Jarque-Bera (JB)	0.008	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	1e-06	1e-12
Skewness	0.0005	1e-12
aic	4e-06	1e-12
bic	1e-06	1e-12
coef_	1e-06	1e-12
df	1e-06	1e-12
explained_variance	1e-06	1e-12
f	1e-06	1e-12
intercept_	1e-06	1e-12
load_model	1e-06	1e-12
max_error	1e-06	1e-12
mean_absolute_error	1e-06	1e-12
mean_squared_error	1e-06	1e-12
mean_squared_log_error	1e-06	1e-12
median_absolute_error	0.0002	1e-12
ms	0.0007	1e-12
p_value	1e-06	1e-12
predict	1e-06	1e-12
r2	1e-06	1e-12
r2_adj	1e-06	1e-12
rmse	1e-06	1e-12
score	1e-06	1e-12
ss	1e-06	1e-12
to_python	1e-06	1e-12

LinearSVR#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	1e-06	1e-12
F-statistic	1e-06	1e-12
Jarque-Bera (JB)	0.01	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	1e-06	1e-12
Skewness	0.0002	1e-12
aic	1e-06	1e-12
bic	1e-06	1e-12
coef_	1e-06	1e-12
df	1e-06	1e-12
explained_variance	1e-06	1e-12
f	1e-06	1e-12
intercept_	1e-06	1e-12
load_model	0.1	1e-12
max_error	1e-06	1e-12
mean_absolute_error	1e-06	1e-12
mean_squared_error	1e-06	1e-12
mean_squared_log_error	1e-06	1e-12
median_absolute_error	1e-06	1e-12
ms	0.0007	1e-12
p_value	1e-06	1e-12
predict	1e-06	1e-12
r2	1e-06	1e-12
r2_adj	1e-06	1e-12
rmse	1e-06	1e-12
score	1e-06	1e-12
ss	1e-06	1e-12
to_python	1e-06	1e-12

PoissonRegressor#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	2e-06	1e-12
F-statistic	7e-05	1e-12
Jarque-Bera (JB)	0.008	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	2e-06	1e-12
Skewness	0.0005	1e-12
aic	3e-06	1e-12
bic	1e-06	1e-12
coef_	0.0009	1e-12
df	1e-06	1e-12
explained_variance	2e-06	1e-12
f	7e-05	1e-12
intercept_	3e-05	1e-12
load_model	1e-06	1e-12
max_error	2e-05	1e-12
mean_absolute_error	2e-06	1e-12
mean_squared_error	1e-06	1e-12
mean_squared_log_error	1e-06	1e-12
median_absolute_error	0.001	1e-12
ms	0.0007	1e-12
p_value	1e-06	1e-12
predict	1e-06	1e-12
r2	2e-06	1e-12
r2_adj	2e-06	1e-12
rmse	1e-06	1e-12
score	0.02	1e-12
ss	7e-05	1e-12
to_python	1e-06	1e-12

RandomForestRegressor#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	0.09	1e-12
F-statistic	0.2	1e-12
Jarque-Bera (JB)	0.006	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	0.09	1e-12
Skewness	0.0003	1e-12
aic	0.07	1e-12
bic	0.05	1e-12
coef_	1e-06	1e-12
df	1e-06	1e-12
explained_variance	0.09	1e-12
f	0.2	1e-12
intercept_	1e-06	1e-12
load_model	0.0008	1e-12
max_error	0.4	1e-12
mean_absolute_error	2.0	1e-12
mean_squared_error	0.03	1e-12
mean_squared_log_error	0.03	1e-12
median_absolute_error	0.06	1e-12
ms	0.2	1e-12
p_value	1e-06	1e-12
predict	0.0007	1e-12
r2	0.09	1e-12
r2_adj	0.09	1e-12
rmse	0.02	1e-12
score	0.05	1e-12
ss	0.2	1e-12
to_python	1e-06	1e-12

Ridge#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	1e-06	1e-12
F-statistic	1e-06	1e-12
Jarque-Bera (JB)	0.008	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	1e-06	1e-12
Skewness	0.0005	1e-12
aic	4e-06	1e-12
bic	1e-06	1e-12
coef_	1e-06	1e-12
df	1e-06	1e-12
explained_variance	1e-06	1e-12
f	1e-06	1e-12
intercept_	1e-06	1e-12
load_model	1e-06	1e-12
max_error	0.0002	1e-12
mean_absolute_error	7e-06	1e-12
mean_squared_error	1e-06	1e-12
mean_squared_log_error	5e-05	1e-12
median_absolute_error	0.0008	1e-12
ms	0.0007	1e-12
p_value	1e-06	1e-12
predict	1e-06	1e-12
r2	1e-06	1e-12
r2_adj	1e-06	1e-12
rmse	1e-06	1e-12
score	1e-06	1e-12
ss	1e-06	1e-12
to_python	1e-06	1e-12

XGBRegressor#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	0.4	1e-12
F-statistic	2.0	1e-12
Jarque-Bera (JB)	0.006	1e-12
Kurtosis	0.005	1e-12
Prob (F-statistic)	1e-06	1e-12
R-squared	0.4	1e-12
Skewness	0.0003	1e-12
aic	0.9	1e-12
bic	1.4	1e-12
coef_	1e-06	1e-12
df	1e-06	1e-12
explained_variance	0.9	1e-12
f	2.0	1e-12
intercept_	1e-06	1e-12
load_model	0.03	1e-12
max_error	0.3	1e-12
mean_absolute_error	0.4	1e-12
mean_squared_error	0.9	1e-12
mean_squared_log_error	0.2	1e-12
median_absolute_error	0.7	1e-12
ms	1.0	1e-12
p_value	1e-06	1e-12
predict	0.03	1e-12
r2	0.4	1e-12
r2_adj	0.4	1e-12
rmse	0.7	1e-12
score	0.6	1e-12
ss	1.0	1e-12
to_json	1e-06	1e-12
to_python	1e-06	1e-12

Classification#

DecisionTreeClassifier#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
accuracy	0.2	1e-12
auc	0.3	1e-12
confusion_matrix	0.4	1e-12
csi	0.9	1e-12
cutoff_curve	0.03	1e-12
f1_score	0.5	1e-12
lift_chart	0.5	1e-12
load_model	1.0	1e-12
log_loss	100.0	1e-12
markedness	0.6	1e-12
mcc	0.6	1e-12
prc_auc	0.3	1e-12
prc_curve	0.04	1e-12
precision	0.4	1e-12
predict	0.06	1e-12
predict_proba	1.0	1e-12
recall	0.7	1e-12
roc_curve	0.1	1e-12
score	0.004	1e-12
to_python	1.0	1e-12

DummyTreeClassifier#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
accuracy	0.01	1e-12
auc	0.1	1e-12
confusion_matrix	0.4	1e-12
csi	0.1	1e-12
cutoff_curve	0.1	1e-12
f1_score	0.1	1e-12
lift_chart	1.0	1e-12
load_model	0.1	1e-12
log_loss	1.0	1e-12
markedness	0.1	1e-12
mcc	0.01	1e-12
prc_auc	0.01	1e-12
prc_curve	0.01	1e-12
precision	0.1	1e-12
predict	0.03	1e-12
predict_proba	1.0	1e-12
recall	1.0	1e-12
roc_curve	0.1	1e-12
score	0.1	1e-12
to_python	1e-06	1e-12

RandomForestClassifier#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
accuracy	0.04	1e-12
auc	0.03	1e-12
confusion_matrix	0.4	1e-12
csi	0.8	1e-12
cutoff_curve	0.03	1e-12
f1_score	0.6	1e-12
lift_chart	0.3	1e-12
load_model	1.0	1e-12
log_loss	90000000000.0	1e-12
markedness	0.4	1e-12
mcc	0.04	1e-12
prc_auc	0.004	1e-12
prc_curve	0.004	1e-12
precision	0.4	1e-12
predict	0.07	1e-12
predict_proba	0.9	1e-12
recall	0.8	1e-12
roc_curve	0.03	1e-12
score	0.03	1e-12
to_python	1e-06	1e-12

XGBClassifier#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
accuracy	0.09	1e-12
auc	0.05	1e-12
confusion_matrix	0.7	1e-12
csi	0.8	1e-12
cutoff_curve	0.03	1e-12
f1_score	0.7	1e-12
lift_chart	0.8	1e-12
load_model	1.0	1e-12
log_loss	80.0	1e-12
markedness	0.8	1e-12
mcc	0.6	1e-12
prc_auc	0.08	1e-12
prc_curve	0.04	1e-12
precision	0.5	1e-12
predict	0.03	1e-12
predict_proba	3.0	1e-12
recall	0.9	1e-12
roc_curve	0.1	1e-12
score	0.03	1e-12
to_json	0.1	1e-12
to_python	1e-06	1e-12

Others#

AR#

Comparison Library: Statsmodel

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	0.0004	1e-12
R-squared	0.0004	1e-12
aic	0.007	1e-12
bic	0.02	1e-12
explained_variance	3e-05	1e-12
intercept_	1e-06	1e-12
max_error	0.009	1e-12
mean_absolute_error	0.008	1e-12
mean_squared_error	0.005	1e-12
mean_squared_log_error	0.02	1e-12
median_absolute_error	0.08	1e-12
mse_	0.005	1e-12
phi_	0.1	1e-12
predict	0.02	1e-12
r2	0.0004	1e-12
r2_adj	0.0004	1e-12
rmse	0.003	1e-12

ARIMA#

Comparison Library: Statsmodel

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	0.0002	1e-12
R-squared	0.0002	1e-12
aic	0.005	1e-12
bic	0.02	1e-12
explained_variance	0.003	1e-12
intercept_	1e-06	1e-12
max_error	0.03	1e-12
mean_absolute_error	0.02	1e-12
mean_squared_error	0.003	1e-12
mean_squared_log_error	0.03	1e-12
median_absolute_error	0.08	1e-12
mse_	0.003	1e-12
phi_	0.8	1e-12
predict	0.04	1e-12
r2	0.0002	1e-12
r2_adj	0.0002	1e-12
rmse	0.002	1e-12

ARMA#

Comparison Library: Statsmodel

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	1e-06	1e-12
R-squared	1e-06	1e-12
aic	0.006	1e-12
bic	0.02	1e-12
explained_variance	1e-06	1e-12
intercept_	1e-06	1e-12
max_error	4e-06	1e-12
mean_absolute_error	1e-06	1e-12
mean_squared_error	1e-06	1e-12
mean_squared_log_error	1e-05	1e-12
median_absolute_error	0.0001	1e-12
mse_	1e-06	1e-12
phi_	5e-05	1e-12
predict	0.009	1e-12
r2	1e-06	1e-12
r2_adj	1e-06	1e-12
rmse	1e-06	1e-12

KMeans#

Comparison Library: Scikit Learn

Metric	Relative Tolerance	Absolute Tolerance
predict	0.6	1e-12
to_python	1e-06	1e-12

MA#

Comparison Library: Statsmodel

Metric	Relative Tolerance	Absolute Tolerance
Adj. R-squared	0.04	1e-12
R-squared	0.04	1e-12
aic	0.007	1e-12
bic	0.005	1e-12
explained_variance	0.03	1e-12
intercept_	1e-06	1e-12
max_error	0.3	1e-12
mean_absolute_error	0.04	1e-12
mean_squared_error	0.08	1e-12
mean_squared_log_error	0.08	1e-12
median_absolute_error	0.02	1e-12
mse_	0.08	1e-12
phi_	1e-06	1e-12
predict	0.003	1e-12
r2	0.04	1e-12
r2_adj	0.04	1e-12
rmse	0.04	1e-12