Loading...

ML algorithm tolerance compared with other open-source libraries#

At VerticaPy, we pride ourselves on delivering robust and reliable machine learning algorithms. As part of our commitment to quality assurance, we conduct rigorous unit tests to compare the performance of our algorithms with those from other popular open-source libraries, such as scikit-learn. These tests ensure that our implementations maintain parity with industry standards and deliver consistent results across various datasets. To provide transparency and facilitate easy reference, we establish specific tolerances for each type of algorithm. These tolerances serve as benchmarks for evaluating the accuracy and efficiency of our algorithms, guaranteeing that they meet or exceed expectations across diverse machine learning tasks.

In addition to our algorithms, we also establish tolerances for all the different metrics we use to measure the accuracy of our models. The tolerances for all metrics are:

Model Type

Relative Error

Regression

1%

Classification

10%

Below we have created a table that lists some of the different tolerances for several models:

Important

All our models are tested against popular libraries, but only a few are included in the below table.

Note

The formula for relative error:

\[relative\ error = \frac{{|Score_{verticapy} - Score_{python}|}}{{\min(|Score_{verticapy}|, |Score_{python}|)}}\]

The formula for absolute error:

\[relative\ error = \frac{{|Score_{verticapy} - Score_{python}|}}{{1 + \min(|Score_{verticapy}|, |Score_{python}|)}}\]

Absolute error is only used in cases where the result is very close to 0.

For more information on pytest approximations, see the pytest documentation

Regression#

DecisionTreeRegressor#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

0.7

1e-12

F-statistic

0.03

1e-12

Jarque-Bera (JB)

0.006

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

0.7

1e-12

Skewness

0.0003

1e-12

aic

0.3

1e-12

bic

0.4

1e-12

coef_

1e-06

1e-12

df

1e-06

1e-12

explained_variance

0.7

1e-12

f

0.03

1e-12

intercept_

1e-06

1e-12

load_model

3e-16

1e-12

max_error

0.09

1e-12

mean_absolute_error

0.2

1e-12

mean_squared_error

0.2

1e-12

mean_squared_log_error

0.2

1e-12

median_absolute_error

0.09

1e-12

ms

0.03

1e-12

p_value

1e-06

1e-12

predict

3e-16

1e-12

r2

0.7

1e-12

r2_adj

0.7

1e-12

rmse

0.2

1e-12

score

0.04

1e-12

ss

0.03

1e-12

to_python

1e-06

1e-12

DummyTreeRegressor#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

1e-06

1e-12

F-statistic

1e-06

1e-12

Jarque-Bera (JB)

0.006

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

1e-06

1e-12

Skewness

0.0003

1e-12

aic

1e-05

1e-12

bic

1e-06

1e-12

coef_

1e-06

1e-12

df

1e-06

1e-12

explained_variance

1e-06

1e-12

f

1e-06

1e-12

intercept_

1e-06

1e-12

load_model

0.1

1e-12

max_error

0.001

1e-12

mean_absolute_error

1e-05

1e-12

mean_squared_error

1e-06

1e-12

mean_squared_log_error

0.0001

1e-12

median_absolute_error

0.001

1e-12

ms

0.001

1e-12

p_value

1e-06

1e-12

predict

0.1

1e-12

r2

1e-06

1e-12

r2_adj

1e-06

1e-12

rmse

1e-06

1e-12

score

1e-06

1e-12

ss

1e-06

1e-12

to_python

1e-06

1e-12

ElasticNet#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

1e-06

1e-12

F-statistic

1e-06

1e-09

Jarque-Bera (JB)

0.008

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

1e-06

1e-12

Skewness

0.0005

1e-12

aic

7e-06

1e-12

bic

1e-06

1e-12

coef_

1e-06

1e-12

df

1e-06

1e-12

explained_variance

1e-06

1e-12

f

1e-06

1e-12

intercept_

1e-06

1e-12

load_model

1e-06

1e-12

max_error

1e-06

1e-12

mean_absolute_error

1e-06

1e-12

mean_squared_error

1e-06

1e-12

mean_squared_log_error

1e-06

1e-12

median_absolute_error

1e-06

1e-12

ms

0.0007

1e-12

p_value

1e-06

1e-12

predict

1e-06

1e-12

r2

1e-06

1e-12

r2_adj

1e-06

1e-12

rmse

1e-06

1e-12

score

1e-06

1e-12

ss

1e-06

1e-12

to_python

1e-06

1e-12

Lasso#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

1e-06

1e-12

F-statistic

1e-06

1e-09

Jarque-Bera (JB)

0.008

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

1e-06

1e-12

Skewness

0.0005

1e-12

aic

7e-06

1e-12

bic

1e-06

1e-12

coef_

1e-06

1e-12

df

1e-06

1e-12

explained_variance

1e-06

1e-12

f

1e-06

1e-12

intercept_

1e-06

1e-12

load_model

1e-06

1e-12

max_error

1e-06

1e-12

mean_absolute_error

1e-06

1e-12

mean_squared_error

1e-06

1e-12

mean_squared_log_error

1e-06

1e-12

median_absolute_error

1e-06

1e-12

ms

0.0007

1e-12

p_value

1e-06

1e-12

predict

1e-06

1e-12

r2

1e-06

1e-12

r2_adj

1e-06

1e-12

rmse

1e-06

1e-12

score

1e-06

1e-12

ss

1e-06

1e-12

to_python

1e-06

1e-12

LinearRegression#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

1e-06

1e-12

F-statistic

1e-06

1e-12

Jarque-Bera (JB)

0.008

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

1e-06

1e-12

Skewness

0.0005

1e-12

aic

4e-06

1e-12

bic

1e-06

1e-12

coef_

1e-06

1e-12

df

1e-06

1e-12

explained_variance

1e-06

1e-12

f

1e-06

1e-12

intercept_

1e-06

1e-12

load_model

1e-06

1e-12

max_error

1e-06

1e-12

mean_absolute_error

1e-06

1e-12

mean_squared_error

1e-06

1e-12

mean_squared_log_error

1e-06

1e-12

median_absolute_error

0.0002

1e-12

ms

0.0007

1e-12

p_value

1e-06

1e-12

predict

1e-06

1e-12

r2

1e-06

1e-12

r2_adj

1e-06

1e-12

rmse

1e-06

1e-12

score

1e-06

1e-12

ss

1e-06

1e-12

to_python

1e-06

1e-12

LinearSVR#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

1e-06

1e-12

F-statistic

1e-06

1e-12

Jarque-Bera (JB)

0.01

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

1e-06

1e-12

Skewness

0.0002

1e-12

aic

1e-06

1e-12

bic

1e-06

1e-12

coef_

1e-06

1e-12

df

1e-06

1e-12

explained_variance

1e-06

1e-12

f

1e-06

1e-12

intercept_

1e-06

1e-12

load_model

0.1

1e-12

max_error

1e-06

1e-12

mean_absolute_error

1e-06

1e-12

mean_squared_error

1e-06

1e-12

mean_squared_log_error

1e-06

1e-12

median_absolute_error

1e-06

1e-12

ms

0.0007

1e-12

p_value

1e-06

1e-12

predict

1e-06

1e-12

r2

1e-06

1e-12

r2_adj

1e-06

1e-12

rmse

1e-06

1e-12

score

1e-06

1e-12

ss

1e-06

1e-12

to_python

1e-06

1e-12

PoissonRegressor#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

2e-06

1e-12

F-statistic

7e-05

1e-12

Jarque-Bera (JB)

0.008

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

2e-06

1e-12

Skewness

0.0005

1e-12

aic

3e-06

1e-12

bic

1e-06

1e-12

coef_

0.0009

1e-12

df

1e-06

1e-12

explained_variance

2e-06

1e-12

f

7e-05

1e-12

intercept_

3e-05

1e-12

load_model

1e-06

1e-12

max_error

2e-05

1e-12

mean_absolute_error

2e-06

1e-12

mean_squared_error

1e-06

1e-12

mean_squared_log_error

1e-06

1e-12

median_absolute_error

0.001

1e-12

ms

0.0007

1e-12

p_value

1e-06

1e-12

predict

1e-06

1e-12

r2

2e-06

1e-12

r2_adj

2e-06

1e-12

rmse

1e-06

1e-12

score

0.02

1e-12

ss

7e-05

1e-12

to_python

1e-06

1e-12

RandomForestRegressor#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

0.09

1e-12

F-statistic

0.2

1e-12

Jarque-Bera (JB)

0.006

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

0.09

1e-12

Skewness

0.0003

1e-12

aic

0.07

1e-12

bic

0.05

1e-12

coef_

1e-06

1e-12

df

1e-06

1e-12

explained_variance

0.09

1e-12

f

0.2

1e-12

intercept_

1e-06

1e-12

load_model

0.0008

1e-12

max_error

0.4

1e-12

mean_absolute_error

2.0

1e-12

mean_squared_error

0.03

1e-12

mean_squared_log_error

0.03

1e-12

median_absolute_error

0.06

1e-12

ms

0.2

1e-12

p_value

1e-06

1e-12

predict

0.0007

1e-12

r2

0.09

1e-12

r2_adj

0.09

1e-12

rmse

0.02

1e-12

score

0.05

1e-12

ss

0.2

1e-12

to_python

1e-06

1e-12

Ridge#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

1e-06

1e-12

F-statistic

1e-06

1e-12

Jarque-Bera (JB)

0.008

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

1e-06

1e-12

Skewness

0.0005

1e-12

aic

4e-06

1e-12

bic

1e-06

1e-12

coef_

1e-06

1e-12

df

1e-06

1e-12

explained_variance

1e-06

1e-12

f

1e-06

1e-12

intercept_

1e-06

1e-12

load_model

1e-06

1e-12

max_error

0.0002

1e-12

mean_absolute_error

7e-06

1e-12

mean_squared_error

1e-06

1e-12

mean_squared_log_error

5e-05

1e-12

median_absolute_error

0.0008

1e-12

ms

0.0007

1e-12

p_value

1e-06

1e-12

predict

1e-06

1e-12

r2

1e-06

1e-12

r2_adj

1e-06

1e-12

rmse

1e-06

1e-12

score

1e-06

1e-12

ss

1e-06

1e-12

to_python

1e-06

1e-12

XGBRegressor#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

0.4

1e-12

F-statistic

2.0

1e-12

Jarque-Bera (JB)

0.006

1e-12

Kurtosis

0.005

1e-12

Prob (F-statistic)

1e-06

1e-12

R-squared

0.4

1e-12

Skewness

0.0003

1e-12

aic

0.9

1e-12

bic

1.4

1e-12

coef_

1e-06

1e-12

df

1e-06

1e-12

explained_variance

0.9

1e-12

f

2.0

1e-12

intercept_

1e-06

1e-12

load_model

0.03

1e-12

max_error

0.3

1e-12

mean_absolute_error

0.4

1e-12

mean_squared_error

0.9

1e-12

mean_squared_log_error

0.2

1e-12

median_absolute_error

0.7

1e-12

ms

1.0

1e-12

p_value

1e-06

1e-12

predict

0.03

1e-12

r2

0.4

1e-12

r2_adj

0.4

1e-12

rmse

0.7

1e-12

score

0.6

1e-12

ss

1.0

1e-12

to_json

1e-06

1e-12

to_python

1e-06

1e-12

Classification#

DecisionTreeClassifier#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

accuracy

0.2

1e-12

auc

0.3

1e-12

confusion_matrix

0.4

1e-12

csi

0.9

1e-12

cutoff_curve

0.03

1e-12

f1_score

0.5

1e-12

lift_chart

0.5

1e-12

load_model

1.0

1e-12

log_loss

100.0

1e-12

markedness

0.6

1e-12

mcc

0.6

1e-12

prc_auc

0.3

1e-12

prc_curve

0.04

1e-12

precision

0.4

1e-12

predict

0.06

1e-12

predict_proba

1.0

1e-12

recall

0.7

1e-12

roc_curve

0.1

1e-12

score

0.004

1e-12

to_python

1.0

1e-12

DummyTreeClassifier#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

accuracy

0.01

1e-12

auc

0.1

1e-12

confusion_matrix

0.4

1e-12

csi

0.1

1e-12

cutoff_curve

0.1

1e-12

f1_score

0.1

1e-12

lift_chart

1.0

1e-12

load_model

0.1

1e-12

log_loss

1.0

1e-12

markedness

0.1

1e-12

mcc

0.01

1e-12

prc_auc

0.01

1e-12

prc_curve

0.01

1e-12

precision

0.1

1e-12

predict

0.03

1e-12

predict_proba

1.0

1e-12

recall

1.0

1e-12

roc_curve

0.1

1e-12

score

0.1

1e-12

to_python

1e-06

1e-12

RandomForestClassifier#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

accuracy

0.04

1e-12

auc

0.03

1e-12

confusion_matrix

0.4

1e-12

csi

0.8

1e-12

cutoff_curve

0.03

1e-12

f1_score

0.6

1e-12

lift_chart

0.3

1e-12

load_model

1.0

1e-12

log_loss

90000000000.0

1e-12

markedness

0.4

1e-12

mcc

0.04

1e-12

prc_auc

0.004

1e-12

prc_curve

0.004

1e-12

precision

0.4

1e-12

predict

0.07

1e-12

predict_proba

0.9

1e-12

recall

0.8

1e-12

roc_curve

0.03

1e-12

score

0.03

1e-12

to_python

1e-06

1e-12

XGBClassifier#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

accuracy

0.09

1e-12

auc

0.05

1e-12

confusion_matrix

0.7

1e-12

csi

0.8

1e-12

cutoff_curve

0.03

1e-12

f1_score

0.7

1e-12

lift_chart

0.8

1e-12

load_model

1.0

1e-12

log_loss

80.0

1e-12

markedness

0.8

1e-12

mcc

0.6

1e-12

prc_auc

0.08

1e-12

prc_curve

0.04

1e-12

precision

0.5

1e-12

predict

0.03

1e-12

predict_proba

3.0

1e-12

recall

0.9

1e-12

roc_curve

0.1

1e-12

score

0.03

1e-12

to_json

0.1

1e-12

to_python

1e-06

1e-12

Others#

AR#

Comparison Library: Statsmodel

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

0.0004

1e-12

R-squared

0.0004

1e-12

aic

0.007

1e-12

bic

0.02

1e-12

explained_variance

3e-05

1e-12

intercept_

1e-06

1e-12

max_error

0.009

1e-12

mean_absolute_error

0.008

1e-12

mean_squared_error

0.005

1e-12

mean_squared_log_error

0.02

1e-12

median_absolute_error

0.08

1e-12

mse_

0.005

1e-12

phi_

0.1

1e-12

predict

0.02

1e-12

r2

0.0004

1e-12

r2_adj

0.0004

1e-12

rmse

0.003

1e-12

ARIMA#

Comparison Library: Statsmodel

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

0.0002

1e-12

R-squared

0.0002

1e-12

aic

0.005

1e-12

bic

0.02

1e-12

explained_variance

0.003

1e-12

intercept_

1e-06

1e-12

max_error

0.03

1e-12

mean_absolute_error

0.02

1e-12

mean_squared_error

0.003

1e-12

mean_squared_log_error

0.03

1e-12

median_absolute_error

0.08

1e-12

mse_

0.003

1e-12

phi_

0.8

1e-12

predict

0.04

1e-12

r2

0.0002

1e-12

r2_adj

0.0002

1e-12

rmse

0.002

1e-12

ARMA#

Comparison Library: Statsmodel

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

1e-06

1e-12

R-squared

1e-06

1e-12

aic

0.006

1e-12

bic

0.02

1e-12

explained_variance

1e-06

1e-12

intercept_

1e-06

1e-12

max_error

4e-06

1e-12

mean_absolute_error

1e-06

1e-12

mean_squared_error

1e-06

1e-12

mean_squared_log_error

1e-05

1e-12

median_absolute_error

0.0001

1e-12

mse_

1e-06

1e-12

phi_

5e-05

1e-12

predict

0.009

1e-12

r2

1e-06

1e-12

r2_adj

1e-06

1e-12

rmse

1e-06

1e-12

KMeans#

Comparison Library: Scikit Learn

Metric

Relative Tolerance

Absolute Tolerance

predict

0.6

1e-12

to_python

1e-06

1e-12

MA#

Comparison Library: Statsmodel

Metric

Relative Tolerance

Absolute Tolerance

Adj. R-squared

0.04

1e-12

R-squared

0.04

1e-12

aic

0.007

1e-12

bic

0.005

1e-12

explained_variance

0.03

1e-12

intercept_

1e-06

1e-12

max_error

0.3

1e-12

mean_absolute_error

0.04

1e-12

mean_squared_error

0.08

1e-12

mean_squared_log_error

0.08

1e-12

median_absolute_error

0.02

1e-12

mse_

0.08

1e-12

phi_

1e-06

1e-12

predict

0.003

1e-12

r2

0.04

1e-12

r2_adj

0.04

1e-12

rmse

0.04

1e-12