Loading...

Naive Bayes#

Naive Bayes is a probabilistic classification algorithm based on Bayes’ theorem, which assumes independence between features. This simplicity, combined with its efficiency and effectiveness, makes Naive Bayes particularly well-suited for various classification tasks. By calculating the probability of each class based on the input features, Naive Bayes provides a straightforward yet powerful approach to predictive modeling.

Vertica vs Spark#

Important

Version Details
Vertica: 11.1.0-0
Spark: 3.2.1

The goal is to assess the performance of Vertica’s Naive Bayes algorithm in direct comparison with the implementation in Apache Spark. This evaluation will focus on critical factors such as speed, accuracy, and scalability, providing valuable insights into the comparative strengths and limitations of these two implementations. Our study aims to enhance the understanding of the applicability of Vertica’s Naive Bayes algorithm in diverse data science scenarios, offering practitioners valuable information for making informed algorithmic choices.

Dataset#

Size: 25 M

No. of Rows

No. of Columns

25 M

106

Datatypes of data: Float

Test Environment#

Version

Instance Type

Cluster

vCPU (per node)

Memory (per node)

Deploy Mode

OS

OS Version

Processor freq. (per node)

11.1.0-0

On-Premises VM

1 node

8

20393864 kB

Enterprise

Red Hat Enterprise Linux

7.6 (Maipo)

2.3 GHz

Version

Instance Type

Cluster

vCPU (per node)

Memory (per node)

Deploy Mode

OS

OS Version

Processor freq. (per node)

11.1.0-0

On-Premises VM

4 nodes

8

20393864 kB

Enterprise

Red Hat Enterprise Linux

7.6 (Maipo)

2.3 GHz

Version

Instance Type

Cluster

vCPU (per node)

Memory (per node)

Deploy Mode

OS

OS Version

Processor freq. (per node)

3.2.1

On-Premises VM

1 node

8

20393864 kB

NA

Red Hat Enterprise Linux

7.6 (Maipo)

2.3 GHz

Version

Instance Type

Cluster

vCPU (per node)

Memory (per node)

Deploy Mode

OS

OS Version

Processor freq. (per node)

3.2.1

On-Premises VM

4 nodes

8

20393864 kB

NA

Red Hat Enterprise Linux

7.6 (Maipo)

2.3 GHz

Comparison#

Time in secs#

Training

Prediction - 25 M

Accuracy

AUC

Spark

145.7

1095.79

150.55

146.58

Vertica

9.08

207.56

0.99

2.19

Metrics

Vertica

Spark

Accuracy

0.85

0.85

AUC

0.85

0.77

Browse throught the tabs to see the time and accuracy comparison:

Time in secs#

Training

Prediction- 25 M

Accuracy

AUC

Spark

69.16

1134.03

64.46

63.70

Vertica

4.83

103.9

0.74

0.78

Metrics

Vertica

Spark

Accuracy

0.85

0.85

AUC

0.85

0.77

Browse throught the tabs to see the time and accuracy comparison: