VerticaPy

Python API for Vertica Data Science at Scale

Machine Learning

Tools

API Reference

verticapy.learn.cluster

Class Definition
BisectingKMeans Creates a BisectingKMeans object by using the Vertica Highly Distributed and Scalable BisectingKMeans on the data.
DBSCAN Creates a DBSCAN object by using the DBSCAN algorithm as defined by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu.
KMeans Creates a KMeans object by using the Vertica Highly Distributed and Scalable KMeans on the data.

verticapy.learn.datasets

Function Definition
load_amazon Ingests the Amazon dataset in the Vertica DB (Dataset ideal for TS and Regression).
load_cities Ingests the Cities dataset in the Vertica DB (Dataset ideal for Geospatial).
load_commodities Ingests the Commodities dataset in the Vertica DB (Dataset ideal for TS and Regression).
load_iris Ingests the Iris dataset in the Vertica DB (Dataset ideal for Classification and Clustering).
load_market Ingests the market dataset in the Vertica DB (Dataset ideal for easy exploration).
load_smart_meters Ingests the smart meters dataset in the Vertica DB (Dataset ideal for TS and Regression).
load_titanic Ingests the Titanic dataset in the Vertica DB (Dataset ideal for Classification).
load_winequality Ingests the winequality dataset in the Vertica DB (Dataset ideal for Regression and Classification).
load_world Ingests the World dataset in the Vertica DB (Dataset ideal for Geospatial).

verticapy.learn.decomposition

Class Definition
PCA Creates a PCA (Principal Component Analysis) object by using the Vertica Highly Distributed and Scalable PCA on the data.
SVD Creates a SVD (Singular Value Decomposition) object by using the Vertica Highly Distributed and Scalable SVD on the data.

verticapy.learn.ensemble

Class Definition
RandomForestClassifier Creates a RandomForestClassifier object by using the Vertica Highly Distributed and Scalable Random Forest on the data.
RandomForestRegressor Creates a RandomForestRegressor object by using the Vertica Highly Distributed and Scalable Random Forest on the data.

verticapy.learn.linear_model

Class Definition
ElasticNet Creates a ElasticNet object by using the Vertica Highly Distributed and Scalable Linear Regression on the data.
Lasso Creates a Lasso object by using the Vertica Highly Distributed and Scalable Linear Regression on the data.
LinearRegression Creates a LinearRegression object by using the Vertica Highly Distributed and Scalable Linear Regression on the data.
LogisticRegression Creates a LogisticRegression object by using the Vertica Highly Distributed and Scalable Logistic Regression on the data.
Ridge Creates a Ridge object by using the Vertica Highly Distributed and Scalable Linear Regression on the data.

verticapy.learn.metrics

Function Definition
accuracy_score Computes the Accuracy Score.
anova_table Computes the Anova Table.
auc Computes the ROC AUC (Area Under Curve).
classification_report Computes a classification report using multiple metrics (AUC, accuracy, PRC AUC, F1...).
confusion_matrix Computes the Confusion Matrix.
critical_success_index Computes the Critical Success Index.
explained_variance Computes the Explained Variance.
f1_score Computes the F1 Score.
informedness Computes the Informedness.
log_loss Computes the Log Loss.
markedness Computes the Markedness.
matthews_corrcoef Computes the Matthews Correlation Coefficient.
max_error Computes the Max Error.
mean_absolute_error Computes the Mean Absolute Error.
mean_squared_error Computes the Mean Squared Error.
mean_squared_log_error Computes the Mean Squared Log Error.
median_absolute_error Computes the Median Absolute Error.
multilabel_confusion_matrix Computes the Multi Label Confusion Matrix.
negative_predictive_score Computes the Negative Predictive Score.
prc_auc Computes the PRC AUC (Area Under Curve).
precision_score Computes the Precision Score.
recall_score Computes the Recall Score.
r2_score Computes the R2 Score.
regression_report Computes a regression report using multiple metrics (r2, mse, max error...).
specificity_score Computes the Specificity Score.

verticapy.learn.model_selection

Function Definition
best_k Finds the k-means k based on a score.
cross_validate Computes the k-fold cross-validation of an estimator.
elbow Draws the an elbow curve.
lift_chart Draws a lift chart.
prc_curve Draws a precision-recall curve.
roc_curve Draws a receiver operating characteristic (ROC) curve.

verticapy.learn.naive_bayes

Class Definition
BernoulliNB i.e. NaiveBayes with param nbtype = 'bernoulli'.
CategoricalNB i.e. NaiveBayes with param nbtype = 'categorical'.
GaussianNB i.e. NaiveBayes with param nbtype = 'gaussian'.
MultinomialNB i.e. NaiveBayes with param nbtype = 'multinomial'.
NaiveBayes Creates a NaiveBayes object by using the Vertica Highly Distributed and Scalable Naive Bayes on the data.

verticapy.learn.neighbors

Class Definition
KernelDensity Creates a KernelDensity object.
KNeighborsClassifier Creates a KNeighborsClassifier object by using the K Nearest Neighbors Algorithm.
KNeighborsRegressor Creates a KNeighborsRegressor object by using the k-nearest neighbors algorithm.
LocalOutlierFactor Creates a LocalOutlierFactor object by using the Local Outlier Factor algorithm as defined by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander.
NearestCentroid Creates a NearestCentroid object by using the K Nearest Centroid Algorithm.

verticapy.learn.pipeline

Class Definition
Pipeline Creates a Pipeline object, sequentially applying a list of transformations and a final estimator. The intermediate steps must implement a transform method.

verticapy.learn.preprocessing

Class / Function Definition
Balance Creates a view with an equal distribution of the input data based on the response_column.
CountVectorizer Creates a Text Index which will count the occurences of each word in the data.
MinMaxScaler i.e. Normalizer with param method = 'minmax'.
Normalizer Creates a Vertica Normalizer object.
OneHotEncoder Creates a Vertica OneHotEncoder object.
RobustScaler i.e. Normalizer with param method = 'robust_zscore'.
StandardScaler i.e. Normalizer with param method = 'zscore'.

verticapy.learn.svm

Class Definition
LinearSVC Creates a LinearSVC object by using the Vertica Highly Distributed and Scalable SVM on the data.
LinearSVR Creates a LinearSVR object by using the Vertica Highly Distributed and Scalable SVM on the data.

verticapy.learn.tree

Class Definition
DecisionTreeClassifier Single Decision Tree Classifier.
DecisionTreeRegressor Single Decision Tree Regressor.
DummyTreeClassifier This classifier learns by heart the training data.
DummyTreeRegressor This regressor learns by heart the training data.

verticapy.learn.tsa.models

Class Definition
SARIMAX Creates an SARIMAX object by using the Vertica Highly Distributed and Scalable Linear Regression on the data.
VAR Creates an VAR object by using the Vertica Highly Distributed and Scalable Linear Regression on the data.

verticapy.learn.tsa.tools

Method Definition
adfuller Augmented Dickey-Fuller test (time series stationarity).
durbin_watson Durbin-Watson test (autocorrelation in residuals).
het_arch Engle’s Test for Autoregressive Conditional Heteroscedasticity (ARCH).
het_breuschpagan Breusch-Pagan test for heteroscedasticity.
het_goldfeldquandt Goldfeld-Quandt homoscedasticity test.
het_white White’s Lagrange Multiplier Test for heteroscedasticity.
jarque_bera Jarque-Bera test (distribution normality).
kurtosistest Tests whether the kurtosis is different from the normal distribution.
ljungbox Ljung–Box test (whether any of a group of autocorrelations of a time series are different from zero).
mkt Mann-Kendall test (time series trend).
normaltest Tests whether a sample differs from a normal distribution.
plot_acf_pacf Draws ACF and PACF Charts.
skewtest Tests whether the skewness is different from the normal distribution.