Vertica Eon Mode: Caching

Posted March 5, 2018 by Soniya Shah, Information Developer

white cloud in vault type room representing cloud computing
This blog post was co-authored by Pratibha Rana and Ben Vandiver. Running every query directly against the data in shared storage would result in poor performance and subject the shared storage to heavy load. Eon mode in Vertica introduces a cache called the Depot that avoids reading from the shared storage for frequently used data (See figure below). The depot is a disk cache for caching entire data files from the shared storage. Vertica never modifies storage files once they are written, so the cache only need handle add and drop, never invalidate. The cache eviction policy is a simple least-recently-used (LRU) mechanism, assuming that past access is a good predictor of future need. The cache is write-through – newly added files are likely to be referenced by queries. The file compaction mechanism (mergeout) puts its output files into the cache.

Usage

Let’s look at an example to better understand how the cache is used in practice. Consider a 4 node, 4 shard database. The database has a shared location and a local depot on each of the nodes. The storage_locations tables gives the following details. Note that the communal storage is not attached to any node and the depot locations have a maximum size attached to them. => SELECT node_name, location_path, location_usage, sharing_type, max_size FROM storage_locations; node_name | location_path | location_usage | sharing_type | max_size -----------+----------------------------------------+----------------+--------------+------------- node01 | /scratch_b/DBD/node01_data | DATA,TEMP | NONE | 0 | s3://nimbusdb/fxing/dbd | DATA | COMMUNAL | 0 node02 | /scratch_b/DBD/node02_data | DATA,TEMP | NONE | 0 node03 | /scratch_b/DBD/node03_data | DATA,TEMP | NONE | 0 node04 | /scratch_b/DBD/node04_data | DATA,TEMP | NONE | 0 node01 | /scratch_b/qa/vertica/DBD/node01_depot | DEPOT | NONE | 64424509440 node02 | /scratch_b/qa/vertica/DBD/node02_depot | DEPOT | NONE | 64424509440 node03 | /scratch_b/qa/vertica/DBD/node03_depot | DEPOT | NONE | 64424509440 node04 | /scratch_b/qa/vertica/DBD/node04_depot | DEPOT | NONE | 64424509440 (9 rows) When a table is created and data is loaded, data is copied both to the communal storage and the depot. The files in the depot can be inspected from the system table vs_depot_lru and the files on S3 can be viewed from the S3 bucket. => SELECT node_name,storage_type,size FROM vs_depot_lru WHERE storage_type='ROS_BUNDLE'; node_name | storage_type | size -----------+--------------+------ node01 | ROS_BUNDLE | 535 node01 | ROS_BUNDLE | 561 node04 | ROS_BUNDLE | 561 node04 | ROS_BUNDLE | 562 node02 | ROS_BUNDLE | 570 node02 | ROS_BUNDLE | 535 node03 | ROS_BUNDLE | 562 node03 | ROS_BUNDLE | 570 (8 rows) The S3 bucket also shows the data files. [14:30:55][release@ip-10-0-19-189: ~ ]$ aws s3 ls --recursive s3://nimbusdb/fxing/dbd/ 2018-02-21 13:58:03 562 fxing/dbd/4ec/0232fd1fcb77a1d9b750534797ef78f900d000000001ce11_0.gt 2018-02-21 13:58:03 561 fxing/dbd/572/02f291228ff4ee4eb27c4da4069e9e3e00a000000001ce11_0.gt 2018-02-21 13:58:03 570 fxing/dbd/a3a/0238836c1100ea08a7b87d089c6e354800c000000001ce11_0.gt 2018-02-21 13:58:03 535 fxing/dbd/d91/02bcb3da2b7d42fb5f91113100230c8200b000000001ce11_0.gt There are various system tables that can help you monitor the state of the depot. The vs_depot_lru table shows what files currently reside in the depot and the depot_uploads table shows the files that were uploaded to the shared location. For the data load executed for the above statement, the depot_uploads table shows the following state: => SELECT node_name, file_size_bytes, source_file,destination_file FROM depot_uploads; -[ RECORD 1 ]----+------------------------------------------------------------------------------------------------- node_name | node01 file_size_bytes | 561 source_file | /scratch_b/qa/vertica/DBD/node01_depot/249/02f291228ff4ee4eb27c4da4069e9e3e00a000000001ce11_0.gt destination_file | s3://nimbusdb/fxing/dbd/572/02f291228ff4ee4eb27c4da4069e9e3e00a000000001ce11_0.gt -[ RECORD 2 ]----+------------------------------------------------------------------------------------------------- node_name | node02 file_size_bytes | 535 source_file | /scratch_b/qa/vertica/DBD/node02_depot/745/02bcb3da2b7d42fb5f91113100230c8200b000000001ce11_0.gt destination_file | s3://nimbusdb/fxing/dbd/d91/02bcb3da2b7d42fb5f91113100230c8200b000000001ce11_0.gt -[ RECORD 3 ]----+------------------------------------------------------------------------------------------------- node_name | node04 file_size_bytes | 562 source_file | /scratch_b/qa/vertica/DBD/node04_depot/737/0232fd1fcb77a1d9b750534797ef78f900d000000001ce11_0.gt destination_file | s3://nimbusdb/fxing/dbd/4ec/0232fd1fcb77a1d9b750534797ef78f900d000000001ce11_0.gt -[ RECORD 4 ]----+------------------------------------------------------------------------------------------------- node_name | node03 file_size_bytes | 570 source_file | /scratch_b/qa/vertica/DBD/node03_depot/241/0238836c1100ea08a7b87d089c6e354800c000000001ce11_0.gt destination_file | s3://nimbusdb/fxing/dbd/a3a/0238836c1100ea08a7b87d089c6e354800c000000001ce11_0.gt The overall size statistics are recorded in the vs_depot_size table. => SELECTnode_name, max_size_in_bytes, current_size_in_bytes FROM vs_depot_size; node_name | max_size_in_bytes | current_size_in_bytes -----------+-------------------+----------------------- node01 | 64424509440 | 1096 node02 | 64424509440 | 1105 node03 | 64424509440 | 1132 node04 | 64424509440 | 1123 When a query is run, the data that resides in the depot is read from there and the rest of it is read directly from the shared location. To diagnose any performance issues, the dc_file_reads table can be used to check where a particular file was read from. => SELECT node_name, storageid, storage_path FROM dc_file_reads; node_name | storageid | storage_path -----------+--------------------------------------------------+---------------------------------------- node03 | 0238836c1100ea08a7b87d089c6e354800c000000001ce11 | /scratch_b/qa/vertica/DBD/node03_depot node01 | 02f291228ff4ee4eb27c4da4069e9e3e00a000000001ce11 | /scratch_b/qa/vertica/DBD/node01_depot node02 | 02bcb3da2b7d42fb5f91113100230c8200b000000001ce11 | /scratch_b/qa/vertica/DBD/node02_depot node04 | 0232fd1fcb77a1d9b750534797ef78f900d000000001ce11 | /scratch_b/qa/vertica/DBD/node04_depot (4 rows) If you clear the data depot and then run the query again, the dc_file_reads table will show that the files were read directly from S3 and were fetched to the depot for subsequent reads as shown by the depot_fetches table: => SELECT node_name, storageid, storage_path FROM dc_file_reads; node_name | storageid | storage_path -----------+--------------------------------------------------+---------------------------------------- node01 | 02f291228ff4ee4eb27c4da4069e9e3e00a000000001d05f | s3://nimbusdb/fxing/dbd node03 | 0238836c1100ea08a7b87d089c6e354800c000000001d05f | s3://nimbusdb/fxing/dbd node04 | 0232fd1fcb77a1d9b750534797ef78f900d000000001d05f | s3://nimbusdb/fxing/dbd node02 | 02bcb3da2b7d42fb5f91113100230c8200b000000001d05f | s3://nimbusdb/fxing/dbd (4 rows) => SELECT * FROM depot_fetches; -[ RECORD 1 ]—-+————————————————————————————————- node_name | node04 storage_id | 0232fd1fcb77a1d9b750534797ef78f900d000000001d05f file_size_bytes | 562 source_file | s3://nimbusdb/fxing/dbd/1c4/0232fd1fcb77a1d9b750534797ef78f900d000000001d05f_0.gt destination_file | /scratch_b/qa/vertica/DBD/node04_depot/327/0232fd1fcb77a1d9b750534797ef78f900d000000001d05f_0.gt -[ RECORD 2 ]—-+————————————————————————————————- node_name | node03 storage_id | 0238836c1100ea08a7b87d089c6e354800c000000001d05f file_size_bytes | 570 source_file | s3://nimbusdb/fxing/dbd/245/0238836c1100ea08a7b87d089c6e354800c000000001d05f_0.gt destination_file | /scratch_b/qa/vertica/DBD/node03_depot/831/0238836c1100ea08a7b87d089c6e354800c000000001d05f_0.gt reason | -[ RECORD 3 ]—-+————————————————————————————————- node_name | node02 storage_id | 02bcb3da2b7d42fb5f91113100230c8200b000000001d05f file_size_bytes | 535 source_file | s3://nimbusdb/fxing/dbd/ced/02bcb3da2b7d42fb5f91113100230c8200b000000001d05f_0.gt destination_file | /scratch_b/qa/vertica/DBD/node02_depot/335/02bcb3da2b7d42fb5f91113100230c8200b000000001d05f_0.gt -[ RECORD 4 ]—-+————————————————————————————————- node_name | node01 storage_id | 02f291228ff4ee4eb27c4da4069e9e3e00a000000001d05f file_size_bytes | 561 source_file | s3://nimbusdb/fxing/dbd/572/02f291228ff4ee4eb27c4da4069e9e3e00a000000001ce11_0.gt destination_file | /scratch_b/qa/vertica/DBD/node01_depot/249/02f291228ff4ee4eb27c4da4069e9e3e00a000000001ce11_0.gt

Knobs and Meta Functions

The depot also comes with a variety of knobs and meta functions to adjust the behavior on a case by case basis. Meta Functions: 1. Clear_data_depot (table_name): Deletes all the data from the depot for the given table. Deletes all the data if no table name is given. 2. Alter_location_size(path, node,new size) : This function should be used to change the size of the depot if the workload changes. As a side effect, the least recently used files will be deleted if the new size is smaller. 3. Finish_fetching_files(): If while executing a query some files are not in the depot, they are queued to be fetched by an asynchronous service. This function waits for all the files in the queue to be downloaded. Configuration Parameters: 1. UseDepotForReads : Always check if the files are in depot and read from there. 2. UseDepotForWrites : Write to the depot during loads and upload files to shared storage. 3. UsePeerToPeerDataTransfer: Send data to all the subscribers for a shard at data load time.

Performance

The purpose of the depot is to make sure that the performance is not affected negatively due to shared storage. We did some performance tests to confirm the benefits of a depot. The experiments showed that on an average TPCH queries ran 3.5 times faster when run against a warm depot as opposed to a cold depot. The following hardware specification was used to conduct the experiments:
Catalog Data Temp Depot
TPCH_NIMBUS EBS – single volume S3 Instance Store Instance Store (60GB/node)