Machine Learning Model Versioning

In-DB Model Versioning for Machine Learning in Production

OpenText™ Vertica™ database is trusted by many companies worldwide to store and analyze big data at scale. Running machine learning inside such a database comes with many advantages—including enhanced performance, security benefits, and ease of maintenance. Vertica-ML has been offering distributed machine learning in Vertica database for several years. We constantly add new ML algorithms and tools to empower Vertica users to run highly performant ML algorithms inside the database.

We also envision Vertica as a model storage and management system in an ML ecosystem. To support this goal, we have begun to add support for importing ML models trained by other libraries outside Vertica. All models archived in a Vertica database—whether trained internally or trained externally and then imported—are treated as first-class citizens of the database and enjoy similar security and management capabilities available to other database objects. Now, by adding in-database model versioning, we aim to facilitate MLOps for models archived inside Vertica.

What is Model Versioning?

In a professional production line, there is usually a computer program that uses a trained ML model for prediction. Because of the non-stationary nature of data in many applications, the model in production usually needs to be replaced regularly with newly trained models, sometimes several times a day. Additionally, for technical, business, and regulatory purposes, previously in-production models should be archived and able to return to production if necessary. Model versioning provides the infrastructure to help meet these requirements and decreases the complexity of model management.

In most cases, a data scientist trains many models before proposing one for production. Model versioning, added in Vertica 12.0SP4, provides an infrastructure to register candidate models for production, and then to track and manage their status in the database. The versioning infrastructure supports a collaborative environment where multiple users can submit candidate models for individual applications, which are identified by their registered_name. Within the model versioning environment, all the registered models under a given registered_name—regardless of their creator, algorithm type, or production status—are considered different versions of the same model.

The following post explains the new in-database model versioning tools and gives an example of how data scientists and ML engineers can use the added tools to manage trained models, including what models are in production and for which applications. To better understand the workflow, we first need to explore the MLSUPERVISOR role, which was added in Vertica 12.0SP3.

The MLSUPERVISOR Role

The MLSUPERVISOR role is a predefined role to which all the ML model management privileges of DBADMIN are delegated. In other words, users granted the MLSUPERVISOR role can manage all models in the V_CATALOG.MODELS table on behalf of a dbadmin. By granting this role to trusted users, dbadmin can free themselves from the responsibilities of model management tasks.

Model Versioning in Vertica

To understand how in-DB model versioning works, let's imagine a Vertica customer with two data scientists, DS1 and DS2, and two ML applications, application-I and application-II.

Application-I

After training and evaluating many models, the first data scientist, DS1, decides model "RF_model", which she trained using the in-DB rf_classifier algorithm, is a good candidate for application-I. She registers the model by calling the register_model function as follows:

Registering schema1.RF_model by DS1

-- user DS1 registers the schema1.RF_model model to model_app_I
=> SELECT register_model('schema1.RF_model', 'model_app_I');
                           REGISTER_MODEL                        
-------------------------------------------------------------------------
Model [schema1.RF_model] is registered as [model_app_I], version [1]
(1 row)

The register_model function takes two arguments: first the name of the model which is going to be registered and second the REGISTERED_NAME. This REGISTERED_NAME can represent a group of models for a higher-level application, where each model in the group has a unique version number. Registering a model makes it appear in the v_catalog.registered_models table.

Let's assume that there was previously no model registered with the registered_name 'model_app_I'; hence, the registered version of the 'schema1.RF_model' model is set to 1. Registering a model means proposing it for production. Therefore, it is important to increase the security of the model. For this purpose, the owner of a model is changed to dbadmin, and the previous owner will be given only USAGE privilege on the model. When a user has USAGE privilege on a model, the user is able to observe the model in the v_catalog.models and v_catalog.registered_models tables, retrieve the model's information using the get_model_summary and get_model_attribute functions, and apply the model for in-DB scoring using a proper transformation function.

The status of a model immediately after registration is "under_review". There are five valid values for the status of models appearing in the registered_models table: under_review, staging, declined, production, and archived. The following diagram shows different possible statuses of a model and valid transitions between them.

Only users with MLSUPERVISOR role (or DBADMIN) can change the status of a registered model. Let's assume there is an ML supervisor in our example that approves the recently registered model and changes its status to "staging". Models with this status might be targeted for A/B testing or otherwise verified. After this phase, the ML supervisor can change the status of the model to "production". The change_model_status function should be used to change the status of a registered model:

Changing the status of (model_app_I, 1) by supervisor

-- supervisor changes the status of (registered_name, registered_version)=(model_app_I, 1) to staging
=> SELECT change_model_status('model_app_I', 1, 'staging');
                          CHANGE_MODEL_STATUS                       
-----------------------------------------------------------------------------
The status of model [model_app_I] - version [1] is changed to [staging]
(1 row)
-- supervisor changes the status of (model_app_I, 1) to production
=> SELECT change_model_status('model_app_I', 1, 'production');
                           CHANGE_MODEL_STATUS                       
-----------------------------------------------------------------------------
The status of model [model_app_I] - version [1] is changed to [production]
(1 row)
=> SELECT * FROM REGISTERED_MODELS;
  registered_name | registered_version |   status   |        registered_time        |      model_id     | schema_name | model_name |   model_type  |   category
------------------+--------------------+------------+-------------------------------+-------------------+-------------+------------+---------------+----------------
 model_app_I      |                  1 | PRODUCTION | 2023-03-23 09:52:00.082166-04 | 45035996273714020 | schema1     | RF_model   | RF_CLASSIFIER | VERTICA_MODELS
(1 row)

At this point, the second data scientist, DS2, has trained an in-DB model of type xgb_classifier for application-I and registers it to 'model_app_I'. The model will have an initial status of "under_review" and a version number of 2:

Registering schema2.XGB_model by DS2

-- user DS2 registers the schema2.XGB_model model to model_app_I
=> SELECT register_model('schema2.XGB_model', 'model_app_I');
                            REGISTER_MODEL                        
-------------------------------------------------------------------------
Model [schema2.XGB_model] is registered as [model_app_I], version [2]
(1 row)

Once again, the supervisor can change the status of this model to staging and then to production. Among the models with the same registered_name, only one of them can have the status of "production". Therefore, changing the status of the new model to production will automatically change the status of the previous one to archived:

Changing the status of (model_app_I, 2) by supervisor

-- supervisor changes the status of (model_app_I, 2) to staging
=> SELECT change_model_status('model_app_I', 2, 'staging');
                           CHANGE_MODEL_STATUS                       
-----------------------------------------------------------------------------
The status of model [model_app_I] - version [2] is changed to [staging]
(1 row)
-- supervisor changes the status of (model_app_I, 2) to production
=> SELECT change_model_status('model_app_I', 2, 'production');
                            CHANGE_MODEL_STATUS                       
-----------------------------------------------------------------------------
The status of model [model_app_I] - version [2] is changed to [production]
(1 row)
=> SELECT * FROM REGISTERED_MODELS;
  registered_name | registered_version |   status   |        registered_time        |      model_id     | schema_name | model_name |   model_type   |   category
------------------+--------------------+------------+-------------------------------+-------------------+-------------+------------+----------------+----------------
 model_app_I      |                  2 | PRODUCTION | 2023-03-24 05:29:25.990626-02 | 45035996273714020 | schema2     | XGB_model  | XGB_CLASSIFIER | VERTICA_MODELS
 model_app_I      |                  1 | ARCHIVED   | 2023-03-23 09:52:00.082166-04 | 45035996273850350 | schema1     | RF_model   | RF_CLASSIFIER  | VERTICA_MODELS
(2 rows)

Application-II

Both data scientists in our scenario can register models for multiple applications asynchronously. All registered models are assigned a version number automatically. The versions are strictly increasing integer numbers, and a pair of (registered_name, registered_version) is always guaranteed to be unique. Let's assume that application-II is a regression problem. First, DS2 proposes a model that has been trained in Python, converted into PMML, and then imported into Vertica:

Registering schema2.py_model by DS2

-- user DS2 registers the schema2.py_model model to model_app_II
=> SELECT register_model('schema2.py_model', 'model_app_II');
                            REGISTER_MODEL                        
-------------------------------------------------------------------------
Model [schema2.py_model] is registered as [model_app_II], version [1]
(1 row)

Then, supervisor puts the registered model into production for 'model_app_II':

Changing the status of (model_app_II, 1) by supervisor

-- supervisor changes the status of (model_app_II, 1) to staging
=> SELECT change_model_status('model_app_II', 1, 'staging');
                          CHANGE_MODEL_STATUS                       
-----------------------------------------------------------------------------
The status of model [model_app_II] - version [1] is changed to [staging]
(1 row)
-- supervisor changes the status of (model_app_II, 1) to production
=> SELECT change_model_status('model_app_II', 1, 'production');
                           CHANGE_MODEL_STATUS                       
-----------------------------------------------------------------------------
The status of model [model_app_II] - version [1] is changed to [production]
(1 row)

Meanwhile, DS1 submits a model trained in Vertica to application-II by registering it to 'model_app_II':

Registering schema1.linreg by DS1

-- user DS1 registers the schema1.linreg model to model_app_II
=> SELECT register_model('schema1.linreg', 'model_app_II');
                            REGISTER_MODEL                        
-------------------------------------------------------------------------
Model [schema1.linreg] is registered as [model_app_II], version [2]
(1 row)

Subsequently, DS2 registers a model to 'model_app_II' that is trained outside of Vertica using TensorFlow and then imported into Vertica:

Registering schema2.tf_model by DS2

-- user DS2 registers the schema2.tf_model model to model_app_II
=> SELECT register_model('schema2.tf_model', 'model_app_II');
                            REGISTER_MODEL                        
-------------------------------------------------------------------------
Model [schema2.tf_model] is registered as [model_app_II], version [3]
(1 row)

Let's say that supervisor only changes the status of DS1's model for now:

Changing the status of (model_app_II, 2) by supervisor

-- supervisor changes the status of (model_app_II, 2) to staging
=> SELECT change_model_status('model_app_II', 2, 'staging');
                          CHANGE_MODEL_STATUS                       
-----------------------------------------------------------------------------
The status of model [model_app_II] - version [2] is changed to [staging]
(1 row)
=> SELECT * FROM REGISTERED_MODELS WHERE registered_name = 'model_app_II';
  registered_name | registered_version |    status    |        registered_time        |      model_id     | schema_name | model_name |      model_type       |   category
------------------+--------------------+--------------+-------------------------------+-------------------+-------------+------------+-----------------------+----------------
 model_app_II     |                  3 | UNDER_REVIEW | 2023-03-25 10:29:25.970567-02 | 45035996273853740 | schema2     | tf_model   | TF_MODEL              | TENSORFLOW
 model_app_II     |                  2 | STAGING      | 2023-03-25 08:19:41.190625-03 | 45035996273853815 | schema1     | linreg     | LINEAR_REGRESSION     | VERTICA_MODELS
 model_app_II     |                  1 | PRODUCTION   | 2023-03-21 09:42:10.082135-01 | 45035996273853890 | schema2     | py_model   | PMML_REGRESSION_MODEL | PMML
(3 rows)

The various elements of our in-DB model versioning infrastructure and the status of the registered_models table at the end of the above example are depicted in the following figure:

A dbadmin or any user with the MLSUPERVISOR role can use the DROP MODEL statement to drop any registered model regardless of the model's status. In our example, let's say the supervisor drops the (model_app_II, 2) model, whose corresponding schema_name and model_name can be found in the REGISTERED_MODELS table.

Changing the status of (model_app_II, 2) by supervisor

-- supervisor drops (model_app_II, 2)
=> DROP MODEL schema1.linreg;
DROP MODEL
=> SELECT * FROM REGISTERED_MODELS WHERE registered_name = 'model_app_II';
  registered_name | registered_version |    status    |        registered_time        |      model_id     | schema_name | model_name |      model_type       |   category
------------------+--------------------+--------------+-------------------------------+-------------------+-------------+------------+-----------------------+----------------
 model_app_II     |                  3 | UNDER_REVIEW | 2023-03-25 10:29:25.970567-02 | 45035996273853740 | schema2     | tf_model   | TF_MODEL              | TENSORFLOW
 model_app_II     |                  1 | PRODUCTION   | 2023-03-21 09:42:10.082135-01 | 45035996273853890 | schema2     | py_model   | PMML_REGRESSION_MODEL | PMML
(2 rows)

Model Registration History

For monitoring and debugging purposes, it helps to have access to the registration history; for example, it would be helpful to know who has changed the status of a registered model and when. This information is available from the v_monitor.model_status_history table. For the sake of security, only superusers or users to whom superusers have granted sufficient privileges can query this table.

The content of this table for 'model_app_II' from the above example would be similar to the following:

Content of the MODEL_STATUS_HISTORY table

-- dbadmin queries the model_status_history table
=> SELECT * FROM model_status_history WHERE registered_name = 'model_app_II' ORDER BY registered_version;
 registered_name     | registered_version |  new_status  |  old_status  |       status_change_time      |    operator_id    | operator_name |      model_id     | schema_name | model_name
---------------------+--------------------+--------------+------------- +-------------------------------+-------------------+---------------+-------------------+-------------+-------------
model_app_II         |                  1 | UNDER_REVIEW | UNREGISTERED | 2023-03-21 09:42:10.082135-01 | 45035996273964824 | DS2           | 45035996273853890 | schema2     | py_model
model_app_II         |                  1 | STAGING      | UNDER_REVIEW | 2023-03-21 10:05:23.012172-05 | 45035996273704962 | supervisor    | 45035996273853890 | schema2     | py_model
model_app_II         |                  1 | PRODUCTION   | STAGING      | 2023-03-21 10:35:41.185485-02 | 45035996273704962 | supervisor    | 45035996273853890 | schema2     | py_model
model_app_II         |                  2 | UNDER_REVIEW | UNREGISTERED | 2023-03-25 08:19:56.190625-03 | 45035996273931856 | DS1           | 45035996273853815 |             |
model_app_II         |                  2 | STAGING      | UNDER_REVIEW | 2023-03-25 11:15:13.746265-05 | 45035996273704962 | supervisor    | 45035996273853815 |             |
model_app_II         |                  3 | UNDER_REVIEW | UNREGISTERED | 2023-03-25 10:29:25.970567-02 | 45035996273964824 | DS2           | 45035996273853740 | schema1     | tf_model
(6 rows)

Note Note that the contents of the schema_name and model_name columns are empty for (model_app_II, 2), which we previously dropped. These columns will be empty for any registered model that has been dropped. You can create a new model with the same name on the same schema as a dropped model and register it to the same registered_name, but its model_id and registered_version will be different from the previous model.

It's worth mentioning that the content of the v_monitor.model_status_history table is created based on a Data-Collector named "v_internal.dc_registered_model_status_history". If needed, dbadmin can revise the retention policy properties of this Data-Collector.

Notes

There are a few further points to highlight:

  • Unregistered models don't appear in the registered_models table.

  • If there is a model currently in production for a given registered_name, changing the status of a new model to production under the registered_name automatically changes the status of the previously in-production model to archived.

  • Only dbadmin and users with MLSUPERVISOR role have full privileges (USAGE/ALTER/DROP) over the registered models. Of course, any required privilege can be later delegated to any other user.

  • A model will not appear in the registered_models table after it is dropped, regardless of its previous status.

  • The history listed in the model_status_history table includes models that have been unregistered or dropped. However, the schema_name and model_name columns are empty for the dropped models.

  • Each existing model can be uniquely identified by either its [schema_name.]model_name or the combination of its REGISTERED_NAME and REGISTERED_VERSION.

Takeaway

In-DB model versioning is available in Vertica 12.0SP4 and provides a mechanism for registering candidate models, and then tracking and managing their status in the database. The model versioning tools should decrease the complexity of model management in a production environment. For more information, you can visit the online documentation: Model Versioning.

Please note that the status of a registered model is simply a tag, and does not enforce any usage limit on the model. It would be the application's responsibility to take advantage of the provided in-DB model versioning tools and enforce any extra desired constraints.