Managing Depot Caching

You can control depot caching in several ways:

You can monitor depot activity and settings with several V_MONITOR system tables, or with the Management Console.

Depot caching is supported only on primary shard subscriber nodes.

Depot Gateway Parameters

Vertica depots can cache two types of data:

  • Queried data: The depot facilitates query execution by fetching queried data from communal storage and caching it in the depot. The cached data remains available until it is evicted to make room for fresher data, or for data that is fetched for more recent queries.
  • Loaded data: The depot expedites load operations such as COPY by temporarily caching data until it is uploaded to communal storage.

By default, depots are configured to cache both types of data.

Two configuration parameters determine whether a depot caches queried or loaded data:

Parameter Settings
UseDepotForReads Boolean:
  • 1 (default): Search the depot for the queried data; if not found, fetch the data from communal storage.
  • 0: Bypass the depot and get queried data from communal storage.
UseDepotForWrites Boolean:
  • 1 (default): Write loaded data to the depot, then upload files to communal storage.
  • 0: Bypass the depot and write directly to communal storage.

Both parameters can be set at session, user and database levels.

If set on the session or user levels, these parameters can be used to segregate read and write activity on the depots of different subclusters. For example, parameters UseDepotForReads and UseDepotForWrites might be set as follows for users joe and rhonda:

=> SHOW USER joe ALL;
          name           | setting
-------------------------+---------
 UseDepotForReads        | 1
 UseDepotForWrites       | 0
(2 rows)
=> SHOW USER rhonda ALL;
          name           | setting
-------------------------+---------
 UseDepotForReads        | 0
 UseDepotForWrites       | 1
(2 rows)

Given these user settings, when joe connects to a Vertica subcluster, his session only uses the current depot to process queries; all load operations are uploaded to communal storage. Conversely, rhonda's sessions only use the depot to process load operations; all queries must fetch their data from communal storage.

Depot Fetching

If a depot is enabled to cache queried data (UseDepotForReads = 1), you can configure how it fetches data from communal storage with configuration parameter DepotOperationsForQuery. This parameter has three settings:

  • ALL (default): Fetch file data from communal storage, if necessary displace existing files by evicting them from the depot.
  • FETCHES: Fetch file data from communal storage only if space is available; otherwise, read the queried data directly from communal storage.
  • NONE: Do not fetch file data to the depot, read the queried data directly from communal storage.

You can set fetching behavior at four levels, in ascending levels of precedence:

For example, you can set DepotOperationsForQuery at the database level as follows:

=> ALTER DATABASE default SET PARAMETER DepotOperationsForQuery = FETCHES;
ALTER DATABASE

This setting applies to all database depots unless overridden at other levels. For example, the following ALTER USER statement specifies fetching behavior for a depot when it processes queries from user joe:

=> ALTER USER joe SET PARAMETER DepotOperationsForQuery = ALL;
ALTER USER

Finally, joe can override his own DepotOperationsForQuery setting by including the DEPOT_FETCH hint in individual queries:

SELECT /*+DEPOT_FETCH(NONE)*/ count(*) FROM bar;

Evicting Depot Data

In general, Vertica evicts data from the depot as needed to provide room for new data, and expedite request processing. Before writing new data to the depot, Vertica evaluates it as follows:

  • Data fetched from communal storage: Vertica sizes the download and evicts data from the depot accordingly.
  • Data uploaded from a DML operation such as COPY: Vertica cannot estimate the total size of the upload before it is complete, so it sizes individual buffers and evicts data from the depot as needed.

In both cases, Vertica assesses existing depot data and determines which objects to evict from the depot as follows, in descending order of precedence (most to least vulnerable):

  1. Least recently used unpinned object evicted for any new object, pinned or unpinned.
  2. Least recently used pinned object evicted for a new pinned object.

Pinning Depot Objects

You can set depot pinning policies on database objects to reduce their exposure to eviction. Pinning policies can be set on individual subclusters, or on the entire database, and at different levels of granularity—table, projection, and partitions:

Pinning of… Supported by…
Tables SET_DEPOT_PIN_POLICY_TABLE
Projections SET_DEPOT_PIN_POLICY_PROJECTION
Partitions SET_DEPOT_PIN_POLICY_PARTITION

Usage Guidelines

Pinning one or more objects on a depot affects its retention of fetched (queried) data and uploaded (newly loaded) data. If too much depot space is claimed by pinned objects, the depot might be unable to handle load operations on unpinned objects. In this case, set configuration parameter UseDepotForWrites to 0, so load operations are routed directly to communal storage for processing. Otherwise, load operations are liable to return with an error.

To minimize contention over depot usage, consider the following guidelines:

  • Pin only those objects that are most active in DML operations and queries.
  • Minimize the size of pinned data by setting policies at the smallest effective level—for example, pin only the data of a table's active partition.

Depot Warming

On startup, the depots of new nodes are empty, while the depots of restarted nodes often contain stale data that must be refreshed. When depot warming is enabled, a node that is undergoing startup preemptively loads its depot with frequently queried and pinned data. When the node completes startup and begins to execute queries, its depot already contains much of the data it needs to process those queries. This reduces the need to fetch data from communal storage, and expedites query performance accordingly.

Fetching data to a warming depot can delay node startup.

By default, depot warming is enabled (EnableDepotWarmingFromPeers = 1). A node executes depot warming as follows:

  1. The node checks configuration parameter PreFetchPinnedObjectsToDepotAtStartup. If enabled (set to 1), the node:
    • Gets from the database catalog a list of all objects that are pinned on this node's subcluster.
    • Queues the pinned objects for fetching and calculates their total size.
  2. The node checks configuration parameter EnableDepotWarmingFromPeers. If enabled (set to 1, default), the node:
    • Identifies a peer node in the same subcluster whose depot contents it can copy.
    • After taking into account all pinned objects, calculates how much space remains available in the warming depot.
    • Gets from the peer node a list of the most recently used objects that can fit in the depot.
    • Queues the objects for fetching.
  3. If BackgroundDepotWarming is enabled (set to 1, default), the node loads queued objects into its depot while it is warming, and continues to do so in the background after the node becomes active and starts executing queries. Otherwise (BackgroundDepotWarming = 0), node activation is deferred until the depot fetches and loads all queued objects.

Monitoring the Depot

You can monitor depot activity and settings with several V_MONITOR system tables.

You can also use the Management Console to monitor depot activity. For details, see Monitoring Depot Activity in MC

System table… Shows…
DATA_READS All storage locations that a query reads to obtain data.
DEPOT_EVICTIONS Details about objects that were evicted from the depot.
DEPOT_FETCH_QUEUE Pending depot requests for queried file data to fetch from communal storage.
DEPOT_FILES Objects that are cached in database depots.
DEPOT_PIN_POLICIES Objects —tables and table partitions—that are pinned to database depots.
DEPOT_SIZES Depot caching capacity per node.
DEPOT_UPLOADS Details about depot uploads to communal storage.