
XGBoost¶
Vertica vs Amazon Redshift | Python | PySpark¶
Important
XGBoost is a highly optimized distributed gradient boosting library renowned for its efficiency, flexibility, and portability. Operating within the Gradient Boosting framework, XGBoost implements powerful machine learning algorithms, specifically designed for optimal performance.
This benchmark aims to assess the performance of Vertica’s XGBoost algorithm in comparison to various XGBoost implementations, including those in Spark, Dask, Redshift, and Python.
Implementations to consider:
Amazon Redshift
Python
Dask
PySpark
By conducting this benchmark, we seek to gain insights into the comparative strengths and weaknesses of these implementations. Our evaluation will focus on factors such as speed, accuracy, and scalability. The results of this study will contribute to a better understanding of the suitability of Vertica’s XGBoost algorithm for diverse data science applications.
Below are the machine details on which the tests were carried out:
Cluster |
OS |
OS Version |
RAM (per node) |
Processor freq. (per node) |
Processor cores (per node) |
---|---|---|---|---|---|
4 node |
Red Hat Enterprise Linux |
8.7 (Ootpa) |
755 GB |
2.3 GHz |
36, 2 threads per core |
Datasets¶
Higgs Boson
Amazon
No. of Columns |
---|
29 |
Datatypes of data: Float