DETECT_OUTLIERS
Returns the outliers in a data set based on the outlier threshold. The output is a view containing the outliers.
Important: Before using a machine learning function, be aware that all the ongoing transactions might be committed.
Behavior Type
Syntax
DETECT_OUTLIERS ( 'output_table', 'input_relation','input_columns', 'outlier_method' USING PARAMETERS outlier_threshold=value [, exclude_columns='col1, col2, ... coln',] [partition_columns='col1, col2, ... coln'])
Arguments
output_table |
The name of the table where Vertica saves the outliers from the chosen |
input_relation |
The table or view that contains outlier data. |
input_columns |
The columns of input_relation to be used for determining outliers. The input_columns argument supports the use of a wildcard (*) character in place of column names. |
outlier_method |
The outlier method to use: Valid Values:
|
Parameters
outlier_threshold=value |
(Optional) The value beyond which a data point becomes an outlier. Default value: 3.0 |
exclude_columns='col1, col2, ... col' |
(Optional) The columns from input_relation which you want to exclude from the input_columns argument. |
partition_columns='col1, col2, ... coln' |
(Optional) A comma-separated list of column names from the input_relation which defines the partitions. The function will detect outliers among each partition separately. The default value is empty. |
Privileges
To use DETECT_OUTLIERS, you must either be a superuser or have CREATE privileges for the schema and SELECT privileges for the table.
See GRANT (Schema) and GRANT (Table).
Examples
The following example shows how you can use the DETECT_OUTLIERS function:
=> CREATE TABLE baseball_roster (id identity, last_name varchar(30), hr int, avg float); CREATE TABLE => COPY baseball_roster FROM STDIN; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> Polo|7|.233 >> Gloss|45|.170 >> Gus|12|.345 >> Gee|1|.125 >> Laus|3|.095 >> Hilltop|16|.222 >> Wicker|78|.333 >> Scooter|0|.121 >> Hank|999999|.8888 >> Popup|35|.378 >> \. => SELECT * FROM baseball_roster; id | last_name | hr | avg ----+-----------+--------+-------- 3 | Gus | 12 | 0.345 4 | Gee | 1 | 0.125 6 | Hilltop | 16 | 0.222 10 | Popup | 35 | 0.378 1 | Polo | 7 | 0.233 7 | Wicker | 78 | 0.333 9 | Hank | 999999 | 0.8888 2 | Gloss | 45 | 0.17 5 | Laus | 3 | 0.095 8 | Scooter | 0 | 0.121 (10 rows) => SELECT DETECT_OUTLIERS('baseball_outliers', 'baseball_roster', '*', 'robust_zscore' USING PARAMETERS outlier_threshold=3.0, exclude_columns='id'); DETECT_OUTLIERS -------------------------- Detected 2 outliers (1 row) => SELECT * FROM baseball_outliers; id | last_name | hr | avg ----+-----------+------------+------------- 7 | Wicker | 78 | 0.333 9 | Hank | 999999 | 0.8888 (2 rows)