Loading...

Logistic Regression#

Logistic Regression is a powerful algorithm employed for binary classification tasks. Despite its name, it is primarily used for classification rather than regression. Logistic Regression models the probability that a given instance belongs to a particular category and is widely utilized in various fields, including healthcare, finance, and marketing. Its simplicity, interpretability, and effectiveness make it a popular choice for predictive modeling.

Vertica vs Spark#

Important

Version Details
Vertica: 8.0.1
Spark: 2.02

In this benchmark, we strive to assess the performance of Vertica’s Logistic Regression algorithm in comparison to its implementation in Apache Spark. Our evaluation will delve into crucial metrics such as speed, accuracy, and scalability, aiming to elucidate the strengths and potential trade-offs associated with these implementations. The results of this study will contribute valuable insights for practitioners seeking to leverage Logistic Regression for classification tasks within diverse data science applications.

Dataset#

For this dataset, we created an artifical dataset from a Linear Regression model with some noise.

Test Environment#

Version

Instance Type

Cluster

vCPU (per node)

Memory (per node)

Deploy Mode

OS

OS Version

Processor freq. (per node)

8.0.1

On Premise VM

3 node cluster

36, 2 threads per core

755 GB

Enterprise

Red Hat Enterprise Linux

8.7 (Ootpa)

2.4GHz

Version

Instance Type

Cluster

vCPU (per node)

Memory (per node)

Deploy Mode

OS

OS Version

Processor freq. (per node)

2.02

N/A

N/A

36, 2 threads per core

755 GB

N/A

Red Hat Enterprise Linux

8.7 (Ootpa)

2.4GHz

Comparison#

Data

Vertica 8.01 (With BFGS Optimizer)

Vertica 8.01 (With Newton Optimizer)

Spark 2.0.1(l-bfgs)

Columns

Row

Size

Total Time

Number of Iterations

Error

Total Time

Number of Iterations

Error

Training TIme

Number of Iterations

Error

100

1M

800MB

14.74

85

6.7

23

4.52

41

100

10M

8GB

45.15

42

28.98

22

12.05

39

100

100M

80GB

36.54

2

194.5

22

367.27

39

100

1B

800GB

388.89

2

2389.08

22

2222

39

10

10M

800MB

3.57

3

4.55

20

15.38

35

20

10M

1.6GB

27.09

74

6.15

20

12.34

36

500

10M

40GB

55.37

3

477.05

25

63.02

44

1598

10M

~128GB

490.95

3

8+ hours

321.24

48

Browse through the tabs to see the time comparison: