Loading...

XGBoost

Vertica vs Amazon Redshift | Python | PySpark

Important

Version Details
Vertica: 23.4
Amazon Redshift: Jan 2023
Amazon Sagemaker: Jan 2023
Python Native XGBoost: 3.9.15
PySark: 3.3.1

XGBoost is a highly optimized distributed gradient boosting library renowned for its efficiency, flexibility, and portability. Operating within the Gradient Boosting framework, XGBoost implements powerful machine learning algorithms, specifically designed for optimal performance.

This benchmark aims to assess the performance of Vertica’s XGBoost algorithm in comparison to various XGBoost implementations, including those in Spark, Dask, Redshift, and Python.

Implementations to consider:

  • Amazon Redshift

  • Python

  • Dask

  • PySpark

By conducting this benchmark, we seek to gain insights into the comparative strengths and weaknesses of these implementations. Our evaluation will focus on factors such as speed, accuracy, and scalability. The results of this study will contribute to a better understanding of the suitability of Vertica’s XGBoost algorithm for diverse data science applications.

Below are the machine details on which the tests were carried out:

Cluster

OS

OS Version

RAM (per node)

Processor freq. (per node)

Processor cores (per node)

4 node

Red Hat Enterprise Linux

8.7 (Ootpa)

755 GB

2.3 GHz

36, 2 threads per core

Datasets

  • Higgs Boson

  • Amazon

No. of Columns

29

Datatypes of data: Float