
VerticaPy
Python API for Vertica Data Science at Scale
vDataFrame
-
vDataFrame
Object Introduction -
Descriptive Statistics
Data Exploration -
Correlation & Dependancy
Statistical Relationships -
Plotting & Graphics
Data Visualization -
Preprocessing
Data Cleaning -
Features Engineering
Create new Features -
Join, sort, and Transform
Data Enrichment -
Filter and Sample
Data Filtering -
Utilities
Serialization and Management
API Reference
Analytic Functions
Method | Definition |
---|---|
vDataFrame.analytic | Adds a new vcolumn to the vDataFrame by using an advanced analytical function on one or two specific vcolumns. |
vDataFrame.interpolate | Computes a regular time interval vDataFrame by interpolating the missing values using different techniques. |
vDataFrame.sessionize | Adds a new vcolumn to the vDataFrame which will correspond to sessions. |
Binary Operator Functions
Method | Definition |
---|---|
vDataFrame[].add | Adds the input element to the vcolumn. |
vDataFrame[].div | Divides the vcolumn by the input element. |
vDataFrame[].mul | Multiplies the vcolumn by the input element. |
vDataFrame[].sub | Substracts the input element to the vcolumn. |
Copy
Method | Definition |
---|---|
vDataFrame[].add_copy | Adds a copy vcolumn to the parent vDataFrame. |
vDataFrame.copy | Returns a copy of the vDataFrame. |
Correlation & Dependancy
Method | Definition |
---|---|
vDataFrame.acf | Computes the correlations of the input vcolumn and its lags. |
vDataFrame.chaid | Returns a CHAID (Chi-square Automatic Interaction Detector) tree. |
vDataFrame.corr | Computes the Correlation Matrix of the vDataFrame. |
vDataFrame.corr_pvalue | Computes the Correlation Coefficient of the two input vcolumns and its pvalue. |
vDataFrame.cov | Computes the Covariance Matrix of the vDataFrame. |
vDataFrame.iv_woe | Computes the Information Value (IV) Table. |
vDataFrame[].iv_woe | Computes the Information Value (IV) / Weight Of Evidence (WOE) Table. |
vDataFrame.pacf | Computes the partial correlations of the input vcolumn and its lags. |
vDataFrame.pivot_table_chi2 | Returns the chi-squared term using the pivot table of the response vColumn against the input vcolumns. |
vDataFrame.regr | Computes the Regression Matrix of the vDataFrame. |
Customized Features Creation
Method | Definition |
---|---|
vDataFrame.case_when | Creates a new feature by evaluating the specified conditions. |
vDataFrame.eval | Evaluates a customized expression. |
Data Types Conversion
Method | Definition |
---|---|
vDataFrame.astype | Converts the vColumns to the input types. |
vDataFrame[].astype | Converts the vColumn to the input type. |
vDataFrame.bool_to_int | Converts all the booleans vColumns to integers. |
Dealing with Missing Values
Method | Definition |
---|---|
vDataFrame.dropna | Filters the vDataFrame where the input vColumns are missing. |
vDataFrame[].dropna | Filters the vDataFrame where the vColumn is missing. |
vDataFrame.fillna | Fills the vColumns missing elements using specific rules. |
vDataFrame[].fillna | Fills the vColumn missing elements using specific rules. |
Descriptive Statistics
Method | Definition |
---|---|
vDataFrame.aad | Aggregates the vDataFrame using 'aad' (Average Absolute Deviation). |
vDataFrame[].aad | Aggregates the vcolumn using 'aad' (Average Absolute Deviation). |
vDataFrame.agg / aggregate | Aggregates the vDataFrame using the input functions. |
vDataFrame[].agg / aggregate | Aggregates the vcolumn using the input functions. |
vDataFrame.all | Aggregates the vDataFrame using 'bool_and'. |
vDataFrame.any | Aggregates the vDataFrame using 'bool_or'. |
vDataFrame.avg / mean | Aggregates the vDataFrame using 'avg' (Average). |
vDataFrame[].avg / mean | Aggregates the vcolumn using 'avg' (Average). |
vDataFrame.count | Aggregates the vDataFrame using a list of 'count' (Number of missing values). |
vDataFrame[].count | Aggregates the vcolumn using 'count' (Number of Missing elements). |
vDataFrame.count_percent | Aggregates the vDataFrame using a list of 'count' (the number of non-missing values) and percent (the percent of non-missing values). |
vDataFrame.describe | Aggregates the vDataFrame using multiple statistical aggregations. |
vDataFrame[].describe | Aggregates the vcolumn using multiple statistical aggregations. |
vDataFrame[].distinct | Returns the vcolumn distinct categories. |
vDataFrame.duplicated | Returns the duplicated values. |
vDataFrame.groupby | Aggregates the vDataFrame by grouping the elements. |
vDataFrame.kurt / kurtosis | Aggregates the vDataFrame using 'kurtosis'. |
vDataFrame[].kurt / kurtosis | Aggregates the vcolumn using 'kurtosis'. |
vDataFrame.mad | Aggregates the vDataFrame using 'mad' (Median Absolute Deviation). |
vDataFrame[].mad | Aggregates the vcolumn using 'mad' (Median Absolute Deviation). |
vDataFrame.max | Aggregates the vDataFrame using 'max' (Maximum). |
vDataFrame[].max | Aggregates the vcolumn using 'max' (Maximum). |
vDataFrame.median | Aggregates the vDataFrame using 'median'. |
vDataFrame[].median | Aggregates the vcolumn using 'median'. |
vDataFrame.min | Aggregates the vDataFrame using 'min' (Minimum). |
vDataFrame[].min | Aggregates the vcolumn using 'min' (Minimum). |
vDataFrame[].mode | Returns the nth most occurent element. |
vDataFrame[].nlargest | Returns the n largest vcolumn elements. |
vDataFrame[].nsmallest | Returns the n smallest vcolumn elements. |
vDataFrame.nunique | Aggregates the vDataFrame using 'unique' (cardinality). |
vDataFrame[].numh | Computes the optimal vcolumn bar width. |
vDataFrame[].nunique | Aggregates the vcolumn using 'unique' (cardinality). |
vDataFrame.prod /product | Aggregates the vDataFrame using 'product'. |
vDataFrame[].prod /product | Aggregates the vcolumn using 'product'. |
vDataFrame.quantile | Aggregates the vDataFrame using a list of 'quantiles'. |
vDataFrame[].quantile | Aggregates the vcolumn using an input 'quantile'. |
vDataFrame.score | Computes the score using the input columns and the input method. |
vDataFrame.sem | Aggregates the vDataFrame using 'sem' (Standard Error of the Mean). |
vDataFrame[].sem | Aggregates the vcolumn using 'sem' (Standard Error of the Mean). |
vDataFrame.shape | Returns the number of rows and columns of the vDataFrame. |
vDataFrame.skew / skewness | Aggregates the vDataFrame using 'skewness'. |
vDataFrame[].skew / skewness | Aggregates the vcolumn using 'skewness'. |
vDataFrame.std | Aggregates the vDataFrame using 'std' (Standard Deviation). |
vDataFrame[].std | Aggregates the vcolumn using 'std' (Standard Deviation). |
vDataFrame.sum | Aggregates the vDataFrame using 'sum'. |
vDataFrame[].sum | Aggregates the vcolumn using 'sum'. |
vDataFrame[].topk | Returns the top-k most occurent elements and their percentages of the distribution. |
vDataFrame[].value_counts | Returns the top-k most frequent elements and how often they appear. |
vDataFrame.var | Aggregates the vDataFrame using 'variance'. |
vDataFrame[].var | Aggregates the vcolumn using 'variance'. |
Display
Method | Definition |
---|---|
vDataFrame.idisplay | Displays the specified vDataFrame as an interactive table. |
Encoding
Method | Definition |
---|---|
vDataFrame[].cut | Discretizes the vColumn using the input list. |
vDataFrame[].decode | Encodes the vColumn using a user-defined encoding. |
vDataFrame[].discretize | Discretizes the vColumn using the input method. |
vDataFrame.get_dummies | Encodes the vColumn using the One-Hot Encoding algorithm. |
vDataFrame[].get_dummies | Encodes the vColumn using the One-Hot Encoding algorithm. |
vDataFrame[].label_encode | Encodes the vColumn using a bijection from the different categories to [0, n - 1] |
vDataFrame[].mean_encode | Encode the vColumn using the average of the response partitioned by the different vcolumn categories. |
Features Transformations
Method | Definition |
---|---|
vDataFrame.abs | Applies the absolute value function to the input vcolumns. |
vDataFrame[].abs | Applies the absolute value function to the input vcolumn. |
vDataFrame.apply | Applies each function of the dictionary to the input vcolumns. |
vDataFrame[].apply | Applies a function to the vcolumn. |
vDataFrame[].apply_fun | Applies a default function to the vcolumn. |
vDataFrame.applymap | Applies a function to all the vcolumns. |
vDataFrame[].date_part | Extracts a specific TS field from the vcolumn. |
vDataFrame[].round | Rounds the vcolumn by keeping only the input number of digits after comma. |
vDataFrame[].slice | Slices the vcolumn using a TS rule. The vcolumn will be transformed. |
Filter Columns
Method | Definition |
---|---|
vDataFrame.drop | Drops the input vcolumns from the vDataFrame. |
vDataFrame[].drop | Drops the vcolumn from the vDataFrame. |
vDataFrame.drop_duplicates | Filters the duplicated using a partition by the input vcolumns. |
vDataFrame[].drop_outliers | Drops the vcolumns outliers. |
vDataFrame.search | Searches for elements that match the input conditions. |
vDataFrame.select | Returns a copy of the vDataFrame with only the selected vcolumns. |
Filter Records
Method | Definition |
---|---|
vDataFrame.at_time | Filters the vDataFrame by only keeping the records at the input time. |
vDataFrame.between_time | Filters the vDataFrame by only keeping the records between two input times. |
vDataFrame.filter | Filters the vDataFrame using the input expressions. |
vDataFrame.first | Filters the vDataFrame by only keeping the first records. |
vDataFrame.isin | Looks if some specific records are in the vDataFrame. |
vDataFrame[].isin | Looks if some specific records are in the vcolumn. |
vDataFrame.last | Filters the vDataFrame by only keeping the last records. |
Information
Method | Definition |
---|---|
vDataFrame.catcol | Returns the vDataFrame categorical vcolumns based on a cardinality threshold. |
vDataFrame[].category | Returns the vcolumn category. |
vDataFrame[].ctype | Returns the vcolumn DB type. |
vDataFrame.current_relation | Returns the current vDataFrame relation. |
vDataFrame.datecol | Returns all the vDataFrame vcolumns of type date. |
vDataFrame.dtypes | Returns the different vcolumns types. |
vDataFrame[].dtype | Displays and Returns the vcolumn Data type. |
vDataFrame.empty | Returns True if the vDataFrame is empty. |
vDataFrame.expected_store_usage | Returns the vDataFrame expected store usage. |
vDataFrame.explain | Provides information on how Vertica is computing the current vDataFrame relation. |
vDataFrame.get_columns | Returns the vDataFrame vcolumns. |
vDataFrame.head | Returns the vDataFrame head. |
vDataFrame[].head | Returns the vcolumn head. |
vDataFrame.iloc | Returns a part of the vDataFrame (delimited by an offset and a limit). |
vDataFrame[].iloc | Returns a part of the vcolumn (delimited by an offset and a limit). |
vDataFrame.info | Displays information about the different vDataFrame transformations. |
vDataFrame[].isdate | Returns True if the vcolumn category is date, False otherwise. |
vDataFrame[].isnum | Returns True if the vcolumn is numerical, False otherwise. |
vDataFrame.memory_usage | Returns the vDataFrame memory usage. |
vDataFrame[].memory_usage | Returns the vcolumn memory usage. |
vDataFrame.numcol | Returns the vDataFrame numerical vcolumns. |
vDataFrame.tail | Returns the vDataFrame tail. |
vDataFrame[].tail | Returns the vcolumn tail. |
vDataFrame[].store_usage | Returns the vcolumn expected store usage (unit: b). |
vDataFrame.swap | Swap the two input vcolumns. |
vDataFrame.version | Returns the Vertica version. |
Join, Sort, and Transform
Method | Definition |
---|---|
vDataFrame.append | Merges the vDataFrame with another vDataFrame or an input relation. |
vDataFrame.cdt | Returns the complete disjunctive table of the vDataFrame. |
vDataFrame.groupby | Aggregates the vDataFrame by grouping its elements. |
vDataFrame.join | Joins the vDataFrame with another vDataFrame or an input relation. |
vDataFrame.narrow | Returns the narrow table of the vDataFrame using the input vcolumns. |
vDataFrame.pivot | Returns the pivot of the vDataFrame using the input aggregation. |
vDataFrame.polynomial_comb | Returns a vDataFrame containing the product combination of different input columns. This function is ideal for bivariate analysis. |
vDataFrame.recommend | Recommends items based on the collaborative filtering (CF) technique. |
vDataFrame.sort | Sorts the vDataFrame using the input vcolumns. |
Management
Method | Definition |
---|---|
vDataFrame.del_catalog | Delete the current vDataFrame catalog. |
vDataFrame.load | Loads a previous structure of the vDataFrame. |
vDataFrame.save | Saves the current structure of the vDataFrame. |
Moving Windows
Method | Definition |
---|---|
vDataFrame.cummax | Adds a new vcolumn to the vDataFrame by computing the cumulative maximum of the input vcolumn. |
vDataFrame.cummin | Adds a new vcolumn to the vDataFrame by computing the cumulative minimum of the input vcolumn. |
vDataFrame.cumprod | Adds a new vcolumn to the vDataFrame by computing the cumulative product of the input vcolumn. |
vDataFrame.cumsum | Adds a new vcolumn to the vDataFrame by computing the cumulative sum of the input vcolumn. |
vDataFrame.rolling | Adds a new vcolumn to the vDataFrame by using an advanced analytical window function on one or two specific vcolumns. |
Normalization and Global Outliers
Method | Definition |
---|---|
vDataFrame[].clip | Clips the vColumn. |
vDataFrame[].fill_outliers | Fills the vColumns outliers using the input method. |
vDataFrame.normalize | Normalizes the input vColumns using the input method. |
vDataFrame[].normalize | Normalizes the input vColumns using the input method. |
vDataFrame.outliers | Adds a new vColumns labeled with 0 and 1. 1 means that the record is a global outlier. |
Plotting & Graphics
Method | Definition |
---|---|
vDataFrame.animated | Draws the animated chart. |
vDataFrame.bar | Draws the bar chart of the input vColumns based on an aggregation. |
vDataFrame[].bar | Draws the bar chart of the vColumn based on an aggregation. |
vDataFrame.boxplot | Draws the box plot of the input vColumns. |
vDataFrame[].boxplot | Draws the box plot of the vColumn. |
vDataFrame.bubble | Draws the bubble plot of the input vColumns. |
vDataFrame.contour | Draws the contour plot of the input function using the 2 input vColumns. |
vDataFrame.density | Draws the density plot of the vColumns. |
vDataFrame[].density | Draws the density plot of the vColumns. |
vDataFrame.hchart | Draws responsive charts using the Highchart API. |
vDataFrame[].geo_plot | Draws a geospatial object. |
vDataFrame.heatmap | Draws the heatmap of two input vColumns. |
vDataFrame.hexbin | Draws the hexbin of the input vColumns based on an aggregation. |
vDataFrame.hist | Draws the histogram of the input vColumns based on an aggregation. |
vDataFrame[].hist | Draws the histogram of the vColumn based on an aggregation. |
vDataFrame.outliers_plot | Draws the global outliers plot one or two columns based on their ZSCORE. |
vDataFrame.pie | Draws the nested density pie chart of the input vColumns. |
vDataFrame[].pie | Draws the pie chart of the vColumn based on an aggregation. |
vDataFrame.pivot_table | Draws the pivot table of one or two columns based on an aggregation. |
vDataFrame.plot | Draws a time series plot. |
vDataFrame[].plot | Draws the time series of the vColumn. |
vDataFrame[].range_plot | Draws the range plot of the vColumn. |
vDataFrame.scatter | Draws the scatter plot of the input vColumns. |
vDataFrame.scatter_matrix | Draws the scatter matrix of the vDataFrame. |
vDataFrame[].spider | Draws the spider plot of the input vColumn based on an aggregation. |
vDataFrame.stacked_area | Draws a time series stacked area chart. |
Renaming
Method | Definition |
---|---|
vDataFrame[].rename | Renames the vColumn. |
Sample
Method | Definition |
---|---|
vDataFrame.balance | Balances the dataset using the input method. |
vDataFrame.sample | Downsamples the vDataFrame by filtering using a random vcolumn. |
Serialization
Method | Definition |
---|---|
vDataFrame.to_csv | Creates a CSV file of the current vDataFrame relation. |
vDataFrame.to_db | Saves the vDataFrame current relation to the Vertica database. |
vDataFrame.to_geopandas | Converts the vDataFrame to a Geopandas DataFrame. |
vDataFrame.to_json | Creates a JSON file of the current vDataFrame relation. |
vDataFrame.to_list | Converts the vDataFrame to a Python list. |
vDataFrame.to_numpy | Converts the vDataFrame to a Numpy array. |
vDataFrame.to_pandas | Converts the vDataFrame to a pandas DataFrame. |
vDataFrame.to_pickle | Saves the vDataFrame to a Python pickle file. |
vDataFrame.to_shp | Creates a SHP file of the current vDataFrame relation. |
Splitting into Train/Test
Method | Definition |
---|---|
vDataFrame.train_test_split | Creates 2 vDataFrame (train/test) which can be to use to evaluate a model. |
Working with Text
Method | Definition |
---|---|
vDataFrame.regexp | Computes a new vcolumn based on regular expressions. |
vDataFrame[].str_contains | Verifies if the regular expression is in each of the vcolumn records. The vcolumn will be transformed. |
vDataFrame[].str_count | Computes the regular expression count match in each record of the vcolumn. The vcolumn will be transformed. |
vDataFrame[].str_extract | Extracts the regular expression in each record of the vcolumn. The vcolumn will be transformed. |
vDataFrame[].str_replace | Replaces the regular expression matches in each of the vcolumn record by an input value. The vcolumn will be transformed. |
vDataFrame[].str_slice | Slices the vcolumn. The vcolumn will be transformed. |
Working with weights
Method | Definition |
---|---|
vDataFrame.add_duplicates | Duplicates the vDataFrame using the input weight. |