VerticaPy

Python API for Vertica Data Science at Scale

vDataFrame

API Reference

Analytic Functions

Method Definition
vDataFrame.analytic Adds a new vcolumn to the vDataFrame by using an advanced analytical function on one or two specific vcolumns.
vDataFrame.interpolate Computes a regular time interval vDataFrame by interpolating the missing values using different techniques.
vDataFrame.sessionize Adds a new vcolumn to the vDataFrame which will correspond to sessions.

Binary Operator Functions

Method Definition
vDataFrame[].add Adds the input element to the vcolumn.
vDataFrame[].div Divides the vcolumn by the input element.
vDataFrame[].mul Multiplies the vcolumn by the input element.
vDataFrame[].sub Substracts the input element to the vcolumn.

Copy

Method Definition
vDataFrame[].add_copy Adds a copy vcolumn to the parent vDataFrame.
vDataFrame.copy Returns a copy of the vDataFrame.

Correlation & Dependancy

Method Definition
vDataFrame.acf Computes the correlations of the input vcolumn and its lags.
vDataFrame.chaid Returns a CHAID (Chi-square Automatic Interaction Detector) tree.
vDataFrame.corr Computes the Correlation Matrix of the vDataFrame.
vDataFrame.corr_pvalue Computes the Correlation Coefficient of the two input vcolumns and its pvalue.
vDataFrame.cov Computes the Covariance Matrix of the vDataFrame.
vDataFrame.iv_woe Computes the Information Value (IV) Table.
vDataFrame[].iv_woe Computes the Information Value (IV) / Weight Of Evidence (WOE) Table.
vDataFrame.pacf Computes the partial correlations of the input vcolumn and its lags.
vDataFrame.pivot_table_chi2 Returns the chi-squared term using the pivot table of the response vColumn against the input vcolumns.
vDataFrame.regr Computes the Regression Matrix of the vDataFrame.

Customized Features Creation

Method Definition
vDataFrame.case_when Creates a new feature by evaluating the specified conditions.
vDataFrame.eval Evaluates a customized expression.

Data Types Conversion

Method Definition
vDataFrame.astype Converts the vColumns to the input types.
vDataFrame[].astype Converts the vColumn to the input type.
vDataFrame.bool_to_int Converts all the booleans vColumns to integers.

Dealing with Missing Values

Method Definition
vDataFrame.dropna Filters the vDataFrame where the input vColumns are missing.
vDataFrame[].dropna Filters the vDataFrame where the vColumn is missing.
vDataFrame.fillna Fills the vColumns missing elements using specific rules.
vDataFrame[].fillna Fills the vColumn missing elements using specific rules.
vDataFrame.merge_similar_names Merges columns with similar names.

Descriptive Statistics

Method Definition
vDataFrame.aad Aggregates the vDataFrame using 'aad' (Average Absolute Deviation).
vDataFrame[].aad Aggregates the vcolumn using 'aad' (Average Absolute Deviation).
vDataFrame.agg / aggregate Aggregates the vDataFrame using the input functions.
vDataFrame[].agg / aggregate Aggregates the vcolumn using the input functions.
vDataFrame.all Aggregates the vDataFrame using 'bool_and'.
vDataFrame.any Aggregates the vDataFrame using 'bool_or'.
vDataFrame.avg / mean Aggregates the vDataFrame using 'avg' (Average).
vDataFrame[].avg / mean Aggregates the vcolumn using 'avg' (Average).
vDataFrame.count Aggregates the vDataFrame using a list of 'count' (Number of missing values).
vDataFrame[].count Aggregates the vcolumn using 'count' (Number of Missing elements).
vDataFrame.count_percent Aggregates the vDataFrame using a list of 'count' (the number of non-missing values) and percent (the percent of non-missing values).
vDataFrame.describe Aggregates the vDataFrame using multiple statistical aggregations.
vDataFrame[].describe Aggregates the vcolumn using multiple statistical aggregations.
vDataFrame[].distinct Returns the vcolumn distinct categories.
vDataFrame.duplicated Returns the duplicated values.
vDataFrame.groupby Aggregates the vDataFrame by grouping the elements.
vDataFrame.kurt / kurtosis Aggregates the vDataFrame using 'kurtosis'.
vDataFrame[].kurt / kurtosis Aggregates the vcolumn using 'kurtosis'.
vDataFrame.mad Aggregates the vDataFrame using 'mad' (Median Absolute Deviation).
vDataFrame[].mad Aggregates the vcolumn using 'mad' (Median Absolute Deviation).
vDataFrame.max Aggregates the vDataFrame using 'max' (Maximum).
vDataFrame[].max Aggregates the vcolumn using 'max' (Maximum).
vDataFrame.median Aggregates the vDataFrame using 'median'.
vDataFrame[].median Aggregates the vcolumn using 'median'.
vDataFrame.min Aggregates the vDataFrame using 'min' (Minimum).
vDataFrame[].min Aggregates the vcolumn using 'min' (Minimum).
vDataFrame[].mode Returns the nth most occurent element.
vDataFrame[].nlargest Returns the n largest vcolumn elements.
vDataFrame[].nsmallest Returns the n smallest vcolumn elements.
vDataFrame.nunique Aggregates the vDataFrame using 'unique' (cardinality).
vDataFrame[].numh Computes the optimal vcolumn bar width.
vDataFrame[].nunique Aggregates the vcolumn using 'unique' (cardinality).
vDataFrame.prod /product Aggregates the vDataFrame using 'product'.
vDataFrame[].prod /product Aggregates the vcolumn using 'product'.
vDataFrame.quantile Aggregates the vDataFrame using a list of 'quantiles'.
vDataFrame[].quantile Aggregates the vcolumn using an input 'quantile'.
vDataFrame.score Computes the score using the input columns and the input method.
vDataFrame.sem Aggregates the vDataFrame using 'sem' (Standard Error of the Mean).
vDataFrame[].sem Aggregates the vcolumn using 'sem' (Standard Error of the Mean).
vDataFrame.shape Returns the number of rows and columns of the vDataFrame.
vDataFrame.skew / skewness Aggregates the vDataFrame using 'skewness'.
vDataFrame[].skew / skewness Aggregates the vcolumn using 'skewness'.
vDataFrame.std Aggregates the vDataFrame using 'std' (Standard Deviation).
vDataFrame[].std Aggregates the vcolumn using 'std' (Standard Deviation).
vDataFrame.sum Aggregates the vDataFrame using 'sum'.
vDataFrame[].sum Aggregates the vcolumn using 'sum'.
vDataFrame[].topk Returns the top-k most occurent elements and their percentages of the distribution.
vDataFrame[].value_counts Returns the top-k most frequent elements and how often they appear.
vDataFrame.var Aggregates the vDataFrame using 'variance'.
vDataFrame[].var Aggregates the vcolumn using 'variance'.

Display

Method Definition
vDataFrame.idisplay Displays the specified vDataFrame as an interactive table.

Encoding

Method Definition
vDataFrame[].cut Discretizes the vColumn using the input list.
vDataFrame[].decode Encodes the vColumn using a user-defined encoding.
vDataFrame[].discretize Discretizes the vColumn using the input method.
vDataFrame.get_dummies Encodes the vColumn using the One-Hot Encoding algorithm.
vDataFrame[].get_dummies Encodes the vColumn using the One-Hot Encoding algorithm.
vDataFrame[].label_encode Encodes the vColumn using a bijection from the different categories to [0, n - 1]
vDataFrame[].mean_encode Encode the vColumn using the average of the response partitioned by the different vcolumn categories.

Features Transformations

Method Definition
vDataFrame.abs Applies the absolute value function to the input vcolumns.
vDataFrame[].abs Applies the absolute value function to the input vcolumn.
vDataFrame.apply Applies each function of the dictionary to the input vcolumns.
vDataFrame[].apply Applies a function to the vcolumn.
vDataFrame[].apply_fun Applies a default function to the vcolumn.
vDataFrame.applymap Applies a function to all the vcolumns.
vDataFrame[].date_part Extracts a specific TS field from the vcolumn.
vDataFrame[].round Rounds the vcolumn by keeping only the input number of digits after comma.
vDataFrame[].slice Slices the vcolumn using a TS rule. The vcolumn will be transformed.

Filter Columns

Method Definition
vDataFrame.drop Drops the input vcolumns from the vDataFrame.
vDataFrame[].drop Drops the vcolumn from the vDataFrame.
vDataFrame.drop_duplicates Filters the duplicated using a partition by the input vcolumns.
vDataFrame[].drop_outliers Drops the vcolumns outliers.
vDataFrame.search Searches for elements that match the input conditions.
vDataFrame.select Returns a copy of the vDataFrame with only the selected vcolumns.

Filter Records

Method Definition
vDataFrame.at_time Filters the vDataFrame by only keeping the records at the input time.
vDataFrame.between_time Filters the vDataFrame by only keeping the records between two input times.
vDataFrame.filter Filters the vDataFrame using the input expressions.
vDataFrame.first Filters the vDataFrame by only keeping the first records.
vDataFrame.isin Looks if some specific records are in the vDataFrame.
vDataFrame[].isin Looks if some specific records are in the vcolumn.
vDataFrame.last Filters the vDataFrame by only keeping the last records.

Information

Method Definition
vDataFrame.catcol Returns the vDataFrame categorical vcolumns based on a cardinality threshold.
vDataFrame[].category Returns the vcolumn category.
vDataFrame[].ctype Returns the vcolumn DB type.
vDataFrame.current_relation Returns the current vDataFrame relation.
vDataFrame.datecol Returns all the vDataFrame vcolumns of type date.
vDataFrame.dtypes Returns the different vcolumns types.
vDataFrame[].dtype Displays and Returns the vcolumn Data type.
vDataFrame.empty Returns True if the vDataFrame is empty.
vDataFrame.expected_store_usage Returns the vDataFrame expected store usage.
vDataFrame.explain Provides information on how Vertica is computing the current vDataFrame relation.
vDataFrame.get_columns Returns the vDataFrame vcolumns.
vDataFrame[].get_len Returns a new vColumn that represents the length of each element.
vDataFrame.head Returns the vDataFrame head.
vDataFrame[].head Returns the vcolumn head.
vDataFrame.iloc Returns a part of the vDataFrame (delimited by an offset and a limit).
vDataFrame[].iloc Returns a part of the vcolumn (delimited by an offset and a limit).
vDataFrame.info Displays information about the different vDataFrame transformations.
vDataFrame[].isarray Returns True if the vColumn is an array, False otherwise.
vDataFrame[].isbool Returns True if the vColumn is boolean, False otherwise.
vDataFrame[].isdate Returns True if the vcolumn category is date, False otherwise.
vDataFrame[].isnum Returns True if the vcolumn is numerical, False otherwise.
vDataFrame[].isvmap Returns True if the vColumn category is VMap, False otherwise.
vDataFrame.memory_usage Returns the vDataFrame memory usage.
vDataFrame[].memory_usage Returns the vcolumn memory usage.
vDataFrame.numcol Returns the vDataFrame numerical vcolumns.
vDataFrame.tail Returns the vDataFrame tail.
vDataFrame[].tail Returns the vcolumn tail.
vDataFrame[].store_usage Returns the vcolumn expected store usage (unit: b).
vDataFrame.swap Swap the two input vcolumns.
vDataFrame.version Returns the Vertica version.

Join, Sort, and Transform

Method Definition
vDataFrame.append Merges the vDataFrame with another vDataFrame or an input relation.
vDataFrame.cdt Returns the complete disjunctive table of the vDataFrame.
vDataFrame.flat_vmap Flatten the selected VMap. A new vDataFrame is returned.
vDataFrame.groupby Aggregates the vDataFrame by grouping its elements.
vDataFrame.join Joins the vDataFrame with another vDataFrame or an input relation.
vDataFrame.narrow Returns the narrow table of the vDataFrame using the input vcolumns.
vDataFrame.pivot Returns the pivot of the vDataFrame using the input aggregation.
vDataFrame.polynomial_comb Returns a vDataFrame containing the product combination of different input columns. This function is ideal for bivariate analysis.
vDataFrame.recommend Recommends items based on the collaborative filtering (CF) technique.
vDataFrame.sort Sorts the vDataFrame using the input vcolumns.

Management

Method Definition
vDataFrame.del_catalog Delete the current vDataFrame catalog.
vDataFrame.load Loads a previous structure of the vDataFrame.
vDataFrame.save Saves the current structure of the vDataFrame.

Moving Windows

Method Definition
vDataFrame.cummax Adds a new vcolumn to the vDataFrame by computing the cumulative maximum of the input vcolumn.
vDataFrame.cummin Adds a new vcolumn to the vDataFrame by computing the cumulative minimum of the input vcolumn.
vDataFrame.cumprod Adds a new vcolumn to the vDataFrame by computing the cumulative product of the input vcolumn.
vDataFrame.cumsum Adds a new vcolumn to the vDataFrame by computing the cumulative sum of the input vcolumn.
vDataFrame.rolling Adds a new vcolumn to the vDataFrame by using an advanced analytical window function on one or two specific vcolumns.

Normalization and Global Outliers

Method Definition
vDataFrame[].clip Clips the vColumn.
vDataFrame[].fill_outliers Fills the vColumns outliers using the input method.
vDataFrame.normalize Normalizes the input vColumns using the input method.
vDataFrame[].normalize Normalizes the input vColumns using the input method.
vDataFrame.outliers Adds a new vColumns labeled with 0 and 1. 1 means that the record is a global outlier.

Plotting & Graphics

Method Definition
vDataFrame.animated Draws the animated chart.
vDataFrame.bar Draws the bar chart of the input vColumns based on an aggregation.
vDataFrame[].bar Draws the bar chart of the vColumn based on an aggregation.
vDataFrame.boxplot Draws the box plot of the input vColumns.
vDataFrame[].boxplot Draws the box plot of the vColumn.
vDataFrame.bubble Draws the bubble plot of the input vColumns.
vDataFrame.contour Draws the contour plot of the input function using the 2 input vColumns.
vDataFrame.density Draws the density plot of the vColumns.
vDataFrame[].density Draws the density plot of the vColumns.
vDataFrame.hchart Draws responsive charts using the Highchart API.
vDataFrame[].geo_plot Draws a geospatial object.
vDataFrame.heatmap Draws the heatmap of two input vColumns.
vDataFrame.hexbin Draws the hexbin of the input vColumns based on an aggregation.
vDataFrame.hist Draws the histogram of the input vColumns based on an aggregation.
vDataFrame[].hist Draws the histogram of the vColumn based on an aggregation.
vDataFrame.outliers_plot Draws the global outliers plot one or two columns based on their ZSCORE.
vDataFrame.pie Draws the nested density pie chart of the input vColumns.
vDataFrame[].pie Draws the pie chart of the vColumn based on an aggregation.
vDataFrame.pivot_table Draws the pivot table of one or two columns based on an aggregation.
vDataFrame.plot Draws a time series plot.
vDataFrame[].plot Draws the time series of the vColumn.
vDataFrame[].range_plot Draws the range plot of the vColumn.
vDataFrame.scatter Draws the scatter plot of the input vColumns.
vDataFrame.scatter_matrix Draws the scatter matrix of the vDataFrame.
vDataFrame[].spider Draws the spider plot of the input vColumn based on an aggregation.
vDataFrame.stacked_area Draws a time series stacked area chart.

Renaming

Method Definition
vDataFrame[].rename Renames the vColumn.

Sample

Method Definition
vDataFrame.balance Balances the dataset using the input method.
vDataFrame.sample Downsamples the vDataFrame by filtering using a random vcolumn.

Serialization

Method Definition
vDataFrame.to_csv Creates a CSV file of the current vDataFrame relation.
vDataFrame.to_db Saves the vDataFrame current relation to the Vertica database.
vDataFrame.to_geopandas Converts the vDataFrame to a Geopandas DataFrame.
vDataFrame.to_json Creates a JSON file of the current vDataFrame relation.
vDataFrame.to_list Converts the vDataFrame to a Python list.
vDataFrame.to_numpy Converts the vDataFrame to a Numpy array.
vDataFrame.to_pandas Converts the vDataFrame to a pandas DataFrame.
vDataFrame.to_pickle Saves the vDataFrame to a Python pickle file.
vDataFrame.to_shp Creates a SHP file of the current vDataFrame relation.

Splitting into Train/Test

Method Definition
vDataFrame.train_test_split Creates 2 vDataFrame (train/test) which can be to use to evaluate a model.

Working with Text

Method Definition
vDataFrame.regexp Computes a new vcolumn based on regular expressions.
vDataFrame[].str_contains Verifies if the regular expression is in each of the vcolumn records. The vcolumn will be transformed.
vDataFrame[].str_count Computes the regular expression count match in each record of the vcolumn. The vcolumn will be transformed.
vDataFrame[].str_extract Extracts the regular expression in each record of the vcolumn. The vcolumn will be transformed.
vDataFrame[].str_replace Replaces the regular expression matches in each of the vcolumn record by an input value. The vcolumn will be transformed.
vDataFrame[].str_slice Slices the vcolumn. The vcolumn will be transformed.

Working with weights

Method Definition
vDataFrame.add_duplicates Duplicates the vDataFrame using the input weight.