
VerticaPy
Python API for Vertica Data Science at Scale
Machine Learning
Tools
API Reference
verticapy.learn.cluster
Class | Definition |
---|---|
BisectingKMeans | Creates a BisectingKMeans object by using the Vertica Highly Distributed and Scalable BisectingKMeans on the data. |
DBSCAN | Creates a DBSCAN object by using the DBSCAN algorithm as defined by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu. |
KMeans | Creates a KMeans object by using the Vertica Highly Distributed and Scalable KMeans on the data. |
verticapy.learn.datasets
Function | Definition |
---|---|
load_amazon | Ingests the Amazon dataset in the Vertica DB (Dataset ideal for TS and Regression). |
load_cities | Ingests the Cities dataset in the Vertica DB (Dataset ideal for Geospatial). |
load_commodities | Ingests the Commodities dataset in the Vertica DB (Dataset ideal for TS and Regression). |
load_iris | Ingests the Iris dataset in the Vertica DB (Dataset ideal for Classification and Clustering). |
load_market | Ingests the market dataset in the Vertica DB (Dataset ideal for easy exploration). |
load_smart_meters | Ingests the smart meters dataset in the Vertica DB (Dataset ideal for TS and Regression). |
load_titanic | Ingests the Titanic dataset in the Vertica DB (Dataset ideal for Classification). |
load_winequality | Ingests the winequality dataset in the Vertica DB (Dataset ideal for Regression and Classification). |
load_world | Ingests the World dataset in the Vertica DB (Dataset ideal for Geospatial). |
verticapy.learn.decomposition
Class | Definition |
---|---|
PCA | Creates a PCA (Principal Component Analysis) object by using the Vertica Highly Distributed and Scalable PCA on the data. |
SVD | Creates a SVD (Singular Value Decomposition) object by using the Vertica Highly Distributed and Scalable SVD on the data. |
verticapy.learn.ensemble
Class | Definition |
---|---|
RandomForestClassifier | Creates a RandomForestClassifier object by using the Vertica Highly Distributed and Scalable Random Forest on the data. |
RandomForestRegressor | Creates a RandomForestRegressor object by using the Vertica Highly Distributed and Scalable Random Forest on the data. |
verticapy.learn.linear_model
Class | Definition |
---|---|
ElasticNet | Creates a ElasticNet object by using the Vertica Highly Distributed and Scalable Linear Regression on the data. |
Lasso | Creates a Lasso object by using the Vertica Highly Distributed and Scalable Linear Regression on the data. |
LinearRegression | Creates a LinearRegression object by using the Vertica Highly Distributed and Scalable Linear Regression on the data. |
LogisticRegression | Creates a LogisticRegression object by using the Vertica Highly Distributed and Scalable Logistic Regression on the data. |
Ridge | Creates a Ridge object by using the Vertica Highly Distributed and Scalable Linear Regression on the data. |
verticapy.learn.metrics
Function | Definition |
---|---|
accuracy_score | Computes the Accuracy Score. |
anova_table | Computes the Anova Table. |
auc | Computes the ROC AUC (Area Under Curve). |
classification_report | Computes a classification report using multiple metrics (AUC, accuracy, PRC AUC, F1...). |
confusion_matrix | Computes the Confusion Matrix. |
critical_success_index | Computes the Critical Success Index. |
explained_variance | Computes the Explained Variance. |
f1_score | Computes the F1 Score. |
informedness | Computes the Informedness. |
log_loss | Computes the Log Loss. |
markedness | Computes the Markedness. |
matthews_corrcoef | Computes the Matthews Correlation Coefficient. |
max_error | Computes the Max Error. |
mean_absolute_error | Computes the Mean Absolute Error. |
mean_squared_error | Computes the Mean Squared Error. |
mean_squared_log_error | Computes the Mean Squared Log Error. |
median_absolute_error | Computes the Median Absolute Error. |
multilabel_confusion_matrix | Computes the Multi Label Confusion Matrix. |
negative_predictive_score | Computes the Negative Predictive Score. |
prc_auc | Computes the PRC AUC (Area Under Curve). |
precision_score | Computes the Precision Score. |
recall_score | Computes the Recall Score. |
r2_score | Computes the R2 Score. |
regression_report | Computes a regression report using multiple metrics (r2, mse, max error...). |
specificity_score | Computes the Specificity Score. |
verticapy.learn.model_selection
Function | Definition |
---|---|
best_k | Finds the k-means k based on a score. |
cross_validate | Computes the k-fold cross-validation of an estimator. |
elbow | Draws the an elbow curve. |
lift_chart | Draws a lift chart. |
prc_curve | Draws a precision-recall curve. |
roc_curve | Draws a receiver operating characteristic (ROC) curve. |
verticapy.learn.naive_bayes
Class | Definition |
---|---|
BernoulliNB | i.e. NaiveBayes with param nbtype = 'bernoulli'. |
CategoricalNB | i.e. NaiveBayes with param nbtype = 'categorical'. |
GaussianNB | i.e. NaiveBayes with param nbtype = 'gaussian'. |
MultinomialNB | i.e. NaiveBayes with param nbtype = 'multinomial'. |
NaiveBayes | Creates a NaiveBayes object by using the Vertica Highly Distributed and Scalable Naive Bayes on the data. |
verticapy.learn.neighbors
Class | Definition |
---|---|
KernelDensity | Creates a KernelDensity object. |
KNeighborsClassifier | Creates a KNeighborsClassifier object by using the K Nearest Neighbors Algorithm. |
KNeighborsRegressor | Creates a KNeighborsRegressor object by using the k-nearest neighbors algorithm. |
LocalOutlierFactor | Creates a LocalOutlierFactor object by using the Local Outlier Factor algorithm as defined by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander. |
NearestCentroid | Creates a NearestCentroid object by using the K Nearest Centroid Algorithm. |
verticapy.learn.pipeline
Class | Definition |
---|---|
Pipeline | Creates a Pipeline object, sequentially applying a list of transformations and a final estimator. The intermediate steps must implement a transform method. |
verticapy.learn.preprocessing
Class / Function | Definition |
---|---|
Balance | Creates a view with an equal distribution of the input data based on the response_column. |
CountVectorizer | Creates a Text Index which will count the occurences of each word in the data. |
MinMaxScaler | i.e. Normalizer with param method = 'minmax'. |
Normalizer | Creates a Vertica Normalizer object. |
OneHotEncoder | Creates a Vertica OneHotEncoder object. |
RobustScaler | i.e. Normalizer with param method = 'robust_zscore'. |
StandardScaler | i.e. Normalizer with param method = 'zscore'. |
verticapy.learn.svm
Class | Definition |
---|---|
LinearSVC | Creates a LinearSVC object by using the Vertica Highly Distributed and Scalable SVM on the data. |
LinearSVR | Creates a LinearSVR object by using the Vertica Highly Distributed and Scalable SVM on the data. |
verticapy.learn.tree
Class | Definition |
---|---|
DecisionTreeClassifier | Single Decision Tree Classifier. |
DecisionTreeRegressor | Single Decision Tree Regressor. |
DummyTreeClassifier | This classifier learns by heart the training data. |
DummyTreeRegressor | This regressor learns by heart the training data. |
verticapy.learn.tsa.models
Class | Definition |
---|---|
SARIMAX | Creates an SARIMAX object by using the Vertica Highly Distributed and Scalable Linear Regression on the data. |
VAR | Creates an VAR object by using the Vertica Highly Distributed and Scalable Linear Regression on the data. |
verticapy.learn.tsa.tools
Method | Definition |
---|---|
adfuller | Augmented Dickey-Fuller test (time series stationarity). |
durbin_watson | Durbin-Watson test (autocorrelation in residuals). |
het_arch | Engle’s Test for Autoregressive Conditional Heteroscedasticity (ARCH). |
het_breuschpagan | Breusch-Pagan test for heteroscedasticity. |
het_goldfeldquandt | Goldfeld-Quandt homoscedasticity test. |
het_white | White’s Lagrange Multiplier Test for heteroscedasticity. |
jarque_bera | Jarque-Bera test (distribution normality). |
kurtosistest | Tests whether the kurtosis is different from the normal distribution. |
ljungbox | Ljung–Box test (whether any of a group of autocorrelations of a time series are different from zero). |
mkt | Mann-Kendall test (time series trend). |
normaltest | Tests whether a sample differs from a normal distribution. |
plot_acf_pacf | Draws ACF and PACF Charts. |
skewtest | Tests whether the skewness is different from the normal distribution. |