BALANCE
Returns a view with an equal distribution of the input data based on the dependent variable.
Important: Before using a machine learning function, be aware that all the ongoing transactions might be committed.
Behavior Type
Syntax
BALANCE ( 'output_view', 'input_relation', 'response_column', 'balance_method' [USING PARAMETERS [sampling_ratio=value] ])
Arguments
output_view |
The name of the View where Vertica saves the balanced data from the chosen |
input_relation |
The table or view that contains the data the function uses to create a more balanced data set. |
response_column |
The dependent variable: a VARCHAR or INT column in the |
balance_method |
The imbalanced processing method to use. Selects data from the minority and majority classes. Valid values
|
Parameters
sampling_ratio=value |
The desired ratio between the majority class and the minority class. This value has no effect when used with balance method Default: 1.0 |
Privileges
To use BALANCE, you must either be a superuser or have CREATE privileges for the schema of the output view and SELECT privileges for the input table or view. There are no privileges needed on the function itself.
See GRANT (Schema) and GRANT (Table).
Examples
The following example shows how you can use the BALANCE function:
=> CREATE TABLE backyard_bugs (id identity, bug_type int, finder varchar(20)); CREATE TABLE => COPY backyard_bugs FROM STDIN; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 1|Ants >> 1|Beetles >> 3|Ladybugs >> 3|Ants >> 3|Beetles >> 3|Caterpillars >> 2|Ladybugs >> 3|Ants >> 3|Beetles >> 1|Ladybugs >> 3|Ladybugs >> \. => SELECT bug_type, COUNT(bug_type) FROM backyard_bugs GROUP BY bug_type; bug_type | COUNT ----------+------- 2 | 1 1 | 3 3 | 7 (3 rows) => SELECT BALANCE('backyard_bugs_balanced', 'backyard_bugs', 'bug_type', 'under_sampling'); BALANCE -------------------------- Finished in 1 iteration (1 row) => SELECT bug_type, COUNT(bug_type) FROM backyard_bugs_balanced GROUP BY bug_type; ----------+------- 2 | 1 1 | 2 3 | 1 (3 rows)