Logistic Regression#
Logistic Regression is a powerful algorithm employed for binary classification tasks. Despite its name, it is primarily used for classification rather than regression. Logistic Regression models the probability that a given instance belongs to a particular category and is widely utilized in various fields, including healthcare, finance, and marketing. Its simplicity, interpretability, and effectiveness make it a popular choice for predictive modeling.
Vertica vs Spark#
Important
In this benchmark, we strive to assess the performance of Vertica’s Logistic Regression algorithm in comparison to its implementation in Apache Spark. Our evaluation will delve into crucial metrics such as speed, accuracy, and scalability, aiming to elucidate the strengths and potential trade-offs associated with these implementations. The results of this study will contribute valuable insights for practitioners seeking to leverage Logistic Regression for classification tasks within diverse data science applications.
Dataset#
For this dataset, we created an artifical dataset from a Linear Regression model with some noise.
Test Environment#
Version |
Instance Type |
Cluster |
vCPU (per node) |
Memory (per node) |
Deploy Mode |
OS |
OS Version |
Processor freq. (per node) |
---|---|---|---|---|---|---|---|---|
8.0.1 |
On Premise VM |
3 node cluster |
36, 2 threads per core |
755 GB |
Enterprise |
Red Hat Enterprise Linux |
8.7 (Ootpa) |
2.4GHz |
Version |
Instance Type |
Cluster |
vCPU (per node) |
Memory (per node) |
Deploy Mode |
OS |
OS Version |
Processor freq. (per node) |
---|---|---|---|---|---|---|---|---|
2.02 |
N/A |
N/A |
36, 2 threads per core |
755 GB |
N/A |
Red Hat Enterprise Linux |
8.7 (Ootpa) |
2.4GHz |
Comparison#
Data |
Vertica 8.01 (With BFGS Optimizer) |
Vertica 8.01 (With Newton Optimizer) |
Spark 2.0.1(l-bfgs) |
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Columns |
Row |
Size |
Total Time |
Number of Iterations |
Error |
Total Time |
Number of Iterations |
Error |
Training TIme |
Number of Iterations |
Error |
100 |
1M |
800MB |
14.74 |
85 |
6.7 |
23 |
4.52 |
41 |
|||
100 |
10M |
8GB |
45.15 |
42 |
28.98 |
22 |
12.05 |
39 |
|||
100 |
100M |
80GB |
36.54 |
2 |
194.5 |
22 |
367.27 |
39 |
|||
100 |
1B |
800GB |
388.89 |
2 |
2389.08 |
22 |
2222 |
39 |
|||
10 |
10M |
800MB |
3.57 |
3 |
4.55 |
20 |
15.38 |
35 |
|||
20 |
10M |
1.6GB |
27.09 |
74 |
6.15 |
20 |
12.34 |
36 |
|||
500 |
10M |
40GB |
55.37 |
3 |
477.05 |
25 |
63.02 |
44 |
|||
1598 |
10M |
~128GB |
490.95 |
3 |
8+ hours |
321.24 |
48 |
Browse through the tabs to see the time comparison: