Managing Depot Caching

You can control depot caching in several ways:

You can monitor depot activity and settings with several V_MONITOR system tables, or with the Management Console.

Depot Gateway Parameters

Vertica depots can cache two types of data:

  • Queried data: The depot facilitates query execution by fetching queried data from communal storage and caching it in the depot. The cached data remains available until it is evicted to make room for fresher data, or for data that is fetched for more recent queries.
  • Loaded data: The depot expedites load operations such as COPY by temporarily caching data until it is uploaded to communal storage.

By default, depots are configured to cache both types of data.

Two configuration parameters determine whether a depot caches queried or loaded data:

Parameter Settings
UseDepotForReads Boolean:
  • 1 (default): Search the depot for the queried data; if not found, fetch the data from communal storage.
  • 0: Bypass the depot and get queried data from communal storage.
UseDepotForWrites Boolean:
  • 1 (default): Write loaded data to the depot, then upload files to communal storage.
  • 0:  Bypass the depot and write directly to communal storage.

Both parameters can be set at session, user and database levels.

If set on the session or user levels, these parameters can be used to segregate read and write activity on the depots of different subclusters. For example, parameters UseDepotForReads and UseDepotForWrites might be set as follows for users joe and rhonda:

=> SHOW USER joe ALL;
          name           | setting
-------------------------+---------
 UseDepotForReads        | 1
 UseDepotForWrites       | 0
(2 rows)
=> SHOW USER rhonda ALL;
          name           | setting
-------------------------+---------
 UseDepotForReads        | 0
 UseDepotForWrites       | 1
(2 rows)

Given these user settings, when joe connects to a Vertica subcluster, his session only uses the current depot to process queries; all load operations are uploaded to communal storage. Conversely, rhonda's sessions only use the depot to process load operations; all queries must fetch their data from communal storage.

Depot Fetching

If a depot is enabled to cache queried data (UseDepotForReads = 1), you can configure how it fetches data from communal storage with configuration parameter DepotOperationsForQuery. This parameter has three settings:

  • ALL (default): Fetch file data from communal storage, if necessary displace existing files by evicting them from the depot.
  • FETCHES: Fetch file data from communal storage only if space is available; otherwise, read the queried data directly from communal storage.
  • NONE: Do not fetch file data to the depot, read the queried data directly from communal storage.

You can set fetching behavior at four levels, in ascending levels of precedence:

For example, you can set DepotOperationsForQuery at the database level as follows:

=> ALTER DATABASE default SET PARAMETER DepotOperationsForQuery = FETCHES;
ALTER DATABASE

This setting applies to all database depots unless overridden at other levels. For example, the following ALTER USER statement specifies fetching behavior for a depot when it processes queries from user joe:

=> ALTER USER joe SET PARAMETER DepotOperationsForQuery = ALL;
ALTER USER

Finally, joe can override his own DepotOperationsForQuery setting by including the DEPOT_FETCH hint in individual queries:

SELECT /*+DEPOT_FETCH(NONE)*/ count(*) FROM bar;

Evicting Depot Data

In general, Vertica evicts data from the depot as needed to provide room for new data, and expedite request processing. Before writing new data to the depot, Vertica evaluates it as follows:

  • Data fetched from communal storage: Vertica sizes the download and evicts data from the depot accordingly.
  • Data uploaded from a DML operation such as COPY: Vertica cannot estimate the total size of the upload before it is complete, so it sizes individual buffers and evicts data from the depot as needed.

In both cases, Vertica assesses existing depot data and determines which objects to evict from the depot as follows, in descending order of precedence (most to least vulnerable):

  1. Least recently used unpinned object evicted for any new object, pinned or unpinned.
  2. Least recently used pinned object evicted for a new pinned object.

Pinning Depot Objects

You pin database objects to depots to reduce their exposure to eviction. Two object types can be pinned: tables and table partitions, with SET_DEPOT_PIN_POLICY_TABLE and SET_DEPOT_PIN_POLICY_PARTITION, respectively. These functions can specify objects for pinning on a subcluster depot, or on all database depots.

Pinning a table or a partition affects depot retention of fetched (queried) data and uploaded (newly loaded) data. If too much depot space is claimed by pinned objects, the depot might be unable to handle load operations on unpinned objects. In this case, set configuration parameter UseDepotForWrites to 0, so load operations are routed directly to communal storage for processing. Otherwise, load operations are liable to return with an error.

As a best practice, consider only pinning objects that are most active in DML operations and queries.

Overriding Pin Policies

You can override pin policies through configuration parameter DepotOperationsForQuery, which can be set at session, user, and database levels. You can also override pin policies for a given query with the hint DEPOT_FETCH.

Monitoring the Depot

You can monitor depot activity and settings with several V_MONITOR system tables.

You can also use the Management Console to monitor depot activity. For details, see Monitoring Depot Activity in MC

System table… Shows…
DATA_READS All storage locations that a query reads to obtain data.
DEPOT_EVICTIONS Details about objects that were evicted from the depot.
DEPOT_FETCH_QUEUE Pending depot requests for queried file data to fetch from communal storage.
DEPOT_FILES Objects that are cached in database depots.
DEPOT_PIN_POLICIES Objects —tables and table partitions—that are pinned to database depots.
DEPOT_SIZES Depot caching capacity per node.
DEPOT_UPLOADS Details about depot uploads to communal storage.