LINEAR_REG

Executes linear regression on an input relation, and returns a linear regression model.

Syntax

LINEAR_REG ( 'model‑name', 'input‑relation', 'response‑column', 'predictor‑columns'
	      [ USING PARAMETERS 
                             [exclude_columns='excluded‑columns']
                             [, optimizer='optimizer‑method']
                             [, regularization='regularization‑method']
                             [, epsilon=epsilon‑value]
                             [, max_iterations=iterations]
                             [, lambda=lamda‑value] 
                             [, alpha=alpha‑value] ] )

Arguments

model‑name	Identifies the model to create, where model‑name conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.
input‑relation	The table or view that contains the training data for building the model. If the input relation is defined in Hive, use `SYNC_WITH_HCATALOG_SCHEMA` to sync the `hcatalog` schema, and then run the machine learning function.
response‑column	Name of the input column that represents the dependent variable or outcome. All values in this column must be numeric, otherwise the model is invalid.
predictor‑columns	Comma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter `exclude_columns` must include response‑column, and any columns that are invalid as predictor columns. All predictor columns must be of type numeric or BOOLEAN; otherwise the model is invalid. All BOOLEAN predictor values are converted to FLOAT values before training: 0 for false, 1 for true. No type checking occurs during prediction, so you can use a BOOLEAN predictor column in training, and during prediction provide a FLOAT column of the same name. In this case, all FLOAT values must be either 0 or 1.

Parameter Settings

Parameter name	Set to…
`exclude_columns`	Comma-separated list of columns from predictor‑columns to exclude from processing.
`optimizer`	The optimizer method used to train the model, one of the following: `Newton` `BFGS` `CGD` If you select `CGD`, regularization‑method must be set to `L1` or `ENet`, otherwise the function returns an error. Default: `CGD` if regularization‑method is set to `L1` or `ENet`, otherwise `Newton`.
`regularization`	Determines the method of regularization, one of the following: `None` (default) `L1` `L2` `ENet`
`epsilon`	Determines whether the algorithm has reached the specified accuracy result. Default: 1e-6
`max_iterations`	Determines the maximum number of iterations the algorithm performs before achieving the specified accuracy result. Default: 100
`lambda`	The regularization parameter value. The value must be zero or non-negative. Default: 1
`alpha`	ENet mixture parameter that defines how much L1 versus L2 regularization to provide. This argument sends a warning if used without ENet regularization. Valid Values: [0,1] A value of 1 is equivalent to L1 and a value of 0 is equivalent to L2.

Model Attributes

Attribute	Description
`data`	The data for the function, including: `coeffNames`: Name of the coefficients. This starts with intercept and then follows with the names of the predictors in the same order specified in the call. `coeff`: Vector of estimated coefficients, with the same order as `coeffNames` `stdErr`: Vector of the standard error of the coefficients, with the same order as `coeffNames` `zValue` (for logistic regression): Vector of z-values of the coefficients, in the same order as `coeffNames` `tValue` (for linear regression): Vector of t-values of the coefficients, in the same order as `coeffNames` `pValue`: Vector of p-values of the coefficients, in the same order as `coeffNames`
`regularization`	The type of regularization to use when training the model.
`lambda`	The regularization parameter. Higher values enforce stronger regularization. This value must be positive.
`alpha`	The elastic net mixture parameter.
`iterations`	The number of iterations that actually occur for the convergence before exceeding max_iteration.
`skippedRows`	The number of rows of `input_relation` that were skipped because they contained an invalid value.
`processedRows`	The total number of rows in `input_relation` minus the `skippedRows`.
`callStr`	The value of all input arguments that were specified at the time the function was called.

Examples

=> SELECT LINEAR_REG('myLinearRegModel', 'faithful', 'eruptions', 'waiting' 
                      USING PARAMETERS optimizer='BFGS');
         LINEAR_REG
----------------------------
 Finished in 10 iterations

(1 row)

LINEAR_REG

Syntax

Arguments

Parameter Settings

Model Attributes

Examples

See Also

Version 9.2.x Documentation