Loading...

verticapy.machine_learning.vertica.tsa.ensemble.TimeSeriesByCategory#

class verticapy.machine_learning.vertica.tsa.ensemble.TimeSeriesByCategory(name: str = None, overwrite_model: bool = False, base_model: TimeSeriesModelBase | None = None)#

This model is built based on multiple base models. You should look at the source models to see entire examples.

Important

This is still Beta.

Parameters#

name: str, optional

Name of the model. The model is stored in the database.

overwrite_model: bool, optional

If set to True, training a model with the same name as an existing model overwrites the existing model.

base_model: TimeSeriesModelBase

The user should provide a base model which will be used for each category. It could be - ARIMA - ARMA - AR - :py:class:`~verticapy.machine_learning.vertica.tsa.MA’

Attributes#

Many attributes are created during the fitting phase.

distinct: list

This provides a sequential list of the categories used to build the different models.

ts: str

The column name for time stamp.

y: str

The column name used for building the model.

_is_already_stored: bool

This tells us whether a model is stored in the Vertica database.

_get_model_names: list

This returns the list of names of the models created.

Examples#

The following examples provide a basic understanding of usage.

Initialization#

For this example, we will use a subset of the amazon dataset.

import verticapy.datasets as vpd

amazon_full = vpd.load_amazon()
📅
date
Date
Abc
state
Varchar(32)
123
number
Integer
11998-01-01ACRE0
21998-01-01ALAGOAS0
31998-01-01AMAPÁ0
41998-01-01AMAZONAS0
51998-01-01BAHIA0
61998-01-01CEARÁ0
71998-01-01DISTRITO FEDERAL0
81998-01-01ESPÍRITO SANTO0
91998-01-01GOIÁS0
101998-01-01MARANHÃO0
111998-01-01MATO GROSSO0
121998-01-01MATO GROSSO DO SUL0
131998-01-01MINAS GERAIS0
141998-01-01PARANÁ0
151998-01-01PARAÍBA0
161998-01-01PARÁ0
171998-01-01PERNAMBUCO0
181998-01-01PIAUÍ0
191998-01-01RIO DE JANEIRO0
201998-01-01RIO GRANDE DO NORTE0
211998-01-01RIO GRANDE DO SUL0
221998-01-01RONDÔNIA0
231998-01-01RORAIMA0
241998-01-01SANTA CATARINA0
251998-01-01SERGIPE0
261998-01-01SÃO PAULO0
271998-01-01TOCANTINS0
281998-02-01ACRE0
291998-02-01ALAGOAS0
301998-02-01AMAPÁ0
311998-02-01AMAZONAS0
321998-02-01BAHIA0
331998-02-01CEARÁ0
341998-02-01DISTRITO FEDERAL0
351998-02-01ESPÍRITO SANTO0
361998-02-01GOIÁS0
371998-02-01MARANHÃO0
381998-02-01MATO GROSSO0
391998-02-01MATO GROSSO DO SUL0
401998-02-01MINAS GERAIS0
411998-02-01PARANÁ0
421998-02-01PARAÍBA0
431998-02-01PARÁ0
441998-02-01PERNAMBUCO0
451998-02-01PIAUÍ0
461998-02-01RIO DE JANEIRO0
471998-02-01RIO GRANDE DO NORTE0
481998-02-01RIO GRANDE DO SUL0
491998-02-01RONDÔNIA0
501998-02-01RORAIMA0
511998-02-01SANTA CATARINA0
521998-02-01SERGIPE0
531998-02-01SÃO PAULO0
541998-02-01TOCANTINS0
551998-03-01ACRE0
561998-03-01ALAGOAS0
571998-03-01AMAPÁ0
581998-03-01AMAZONAS0
591998-03-01BAHIA0
601998-03-01CEARÁ0
611998-03-01DISTRITO FEDERAL0
621998-03-01ESPÍRITO SANTO0
631998-03-01GOIÁS0
641998-03-01MARANHÃO0
651998-03-01MATO GROSSO0
661998-03-01MATO GROSSO DO SUL0
671998-03-01MINAS GERAIS0
681998-03-01PARANÁ0
691998-03-01PARAÍBA0
701998-03-01PARÁ0
711998-03-01PERNAMBUCO0
721998-03-01PIAUÍ0
731998-03-01RIO DE JANEIRO0
741998-03-01RIO GRANDE DO NORTE0
751998-03-01RIO GRANDE DO SUL0
761998-03-01RONDÔNIA0
771998-03-01RORAIMA0
781998-03-01SANTA CATARINA0
791998-03-01SERGIPE0
801998-03-01SÃO PAULO0
811998-03-01TOCANTINS0
821998-04-01ACRE0
831998-04-01ALAGOAS0
841998-04-01AMAPÁ0
851998-04-01AMAZONAS0
861998-04-01BAHIA0
871998-04-01CEARÁ0
881998-04-01DISTRITO FEDERAL0
891998-04-01ESPÍRITO SANTO0
901998-04-01GOIÁS0
911998-04-01MARANHÃO0
921998-04-01MATO GROSSO0
931998-04-01MATO GROSSO DO SUL0
941998-04-01MINAS GERAIS0
951998-04-01PARANÁ0
961998-04-01PARAÍBA0
971998-04-01PARÁ0
981998-04-01PERNAMBUCO0
991998-04-01PIAUÍ0
1001998-04-01RIO DE JANEIRO0
Rows: 1-100 | Columns: 3

We can reduce the number of states for the sake of ease in this example:

amazon = amazon_full[(amazon_full["state"] == "PERNAMBUCO") | (amazon_full["state"] == "SERGIPE")]

Now we can setup a base model that will be created for each unique state inside the dataset. For this example, we use ARIMA.

from verticapy.machine_learning.vertica.tsa import ARIMA

base_model = ARIMA(order = (2, 1, 2))

Finally we can now initiate our multiple models in one go:

from verticapy.machine_learning.vertica.tsa.ensemble import TimeSeriesByCategory

model = TimeSeriesByCategory(base_model = base_model)

Model Fitting#

We can now fit the model:

model.fit(amazon, ts = "date", y = "number", by = "state")

Important

To train a model, you can directly use the vDataFrame or the name of the relation stored in the database. The test set is optional and is only used to compute the test metrics. In verticapy, we don’t work using X matrices and y vectors. Instead, we work directly with lists of predictors and the response name.

Plots#

We can conveniently plot the predictions on a line plot to observe the efficacy of our model. We need to provide the idx which represents the model number.

model.plot(idx = 0, npredictions = 5)

Note

You can find out the name of the category by the distinct attribute. The sequential list of categories correspond to idx = 0, 1 .... model.distinct.

__init__(name: str = None, overwrite_model: bool = False, base_model: TimeSeriesModelBase | None = None) None#

Must be overridden in the child class

Methods

__init__([name, overwrite_model, base_model])

Must be overridden in the child class

contour([nbins, chart])

Draws the model's contour plot.

deploySQL([vdf, ts, y, start, npredictions, ...])

Returns the SQL code needed to deploy the model.

does_model_exists(name[, raise_error, ...])

Checks whether the model is stored in the Vertica database.

drop()

Drops the model from the Vertica database.

export_models(name, path[, kind])

Exports machine learning models.

features_importance([idx, show, chart])

Computes the input submodel's features importance.

fit(input_relation, ts, y, by[, ...])

Trains the model.

get_attributes([attr_name])

Returns the model attributes.

get_match_index(x, col_list[, str_check])

Returns the matching index.

get_params()

Returns the parameters of the model.

get_plotting_lib([class_name, chart, ...])

Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.

get_vertica_attributes([attr_name])

Returns the model Vertica attributes.

import_models(path[, schema, kind])

Imports machine learning models.

plot([idx, vdf, ts, y, start, npredictions, ...])

Draws the input submodel.

predict([vdf, ts, y, start, npredictions, ...])

Predicts using the input relation.

register(registered_name[, raise_error])

Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.

regression_report([metrics, start, ...])

Computes a regression report using multiple metrics to evaluate the model (r2, mse, max error...).

report([metrics, start, npredictions, method])

Computes a regression report using multiple metrics to evaluate the model (r2, mse, max error...).

score([metric, start, npredictions, method])

Computes the model score.

set_params([parameters])

Sets the parameters of the model.

summarize()

Summarizes the model.

to_binary(path)

Exports the model to the Vertica Binary format.

to_pmml(path)

Exports the model to PMML.

to_python([return_proba, ...])

Returns the Python function needed for in-memory scoring without using built-in Vertica functions.

to_sql([X, return_proba, ...])

Returns the SQL code needed to deploy the model without using built-in Vertica functions.

to_tf(path)

Exports the model to the Frozen Graph format (TensorFlow).

Attributes