Creating Storage Policies for Low-Priority Data

If some of your data is in a partitioned table, you can move less-queried partitions to less-expensive storage such as HDFS. The data is still accessible in queries, just at a slower speed. In this scenario, the faster storage is often referred to as "hot storage," and the slower storage is referred to as "cold storage."

Suppose you have a table named messages (containing social-media messages) that is partitioned by the year and month of the message's timestamp. You can list the partitions in the table by querying the PARTITIONS system table.

=> SELECT partition_key, projection_name, node_name, location_label FROM partitions
   ORDER BY partition_key;
partition_key | projection_name |    node_name     | location_label
--------------+-----------------+------------------+----------------
201309        | messages_b1     | v_vmart_node0001 |
201309        | messages_b0     | v_vmart_node0003 |
201309        | messages_b1     | v_vmart_node0002 |
201309        | messages_b1     | v_vmart_node0003 |
201309        | messages_b0     | v_vmart_node0001 |
201309        | messages_b0     | v_vmart_node0002 |
201310        | messages_b0     | v_vmart_node0002 |
201310        | messages_b1     | v_vmart_node0003 |
201310        | messages_b0     | v_vmart_node0001 |
. . .
201405        | messages_b0     | v_vmart_node0002 |
201405        | messages_b1     | v_vmart_node0003 |
201405        | messages_b1     | v_vmart_node0001 |
201405        | messages_b0     | v_vmart_node0001 |
(54 rows)

Next, suppose you find that most queries on this table access only the latest month or two of data. You might decide to move the older data to cold storage in an HDFS-based storage location. After you move the data, it is still available for queries, but with lower query performance.

To move partitions to the HDFS storage location, supply the lowest and highest partition key values to be moved in the SET_OBJECT_STORAGE_POLICY function call. The following example shows how to move data between two dates. In this example:

=> SELECT SET_OBJECT_STORAGE_POLICY('messages','coldstorage', '201309', '201403', 'true');

The partitions within the specified range are moved to the HDFS storage location labeled coldstorage the next time the Tuple Mover runs. This location name now displays in the PARTITIONS system table's location_label column.

=> SELECT partition_key, projection_name, node_name, location_label 
   FROM partitions ORDER BY partition_key;
partition_key | projection_name |    node_name     | location_label
--------------+-----------------+------------------+----------------
201309        | messages_b0     | v_vmart_node0003 | coldstorage
201309        | messages_b1     | v_vmart_node0001 | coldstorage
201309        | messages_b1     | v_vmart_node0002 | coldstorage
201309        | messages_b0     | v_vmart_node0001 | coldstorage   
. . .
201403        | messages_b0     | v_vmart_node0002 | coldstorage
201404        | messages_b0     | v_vmart_node0001 |
201404        | messages_b0     | v_vmart_node0002 |
201404        | messages_b1     | v_vmart_node0001 |
201404        | messages_b1     | v_vmart_node0002 |
201404        | messages_b0     | v_vmart_node0003 |
201404        | messages_b1     | v_vmart_node0003 |
201405        | messages_b0     | v_vmart_node0001 |
201405        | messages_b1     | v_vmart_node0002 |
201405        | messages_b0     | v_vmart_node0002 |
201405        | messages_b0     | v_vmart_node0003 |
201405        | messages_b1     | v_vmart_node0001 |
201405        | messages_b1     | v_vmart_node0003 |
(54 rows)

After your initial data move, you can move additional data to the HDFS storage location periodically. You can move individual partitions or a range of partitions from the "hot" storage to the "cold" storage location using the same method:

=> SELECT SET_OBJECT_STORAGE_POLICY('messages', 'coldstorage', '201404', '201404', 'true');

=> SELECT projection_name, node_name, location_label 
   FROM PARTITIONS WHERE PARTITION_KEY = '201404';
 projection_name |    node_name     | location_label
-----------------+------------------+----------------
 messages_b0     | v_vmart_node0002 | coldstorage
 messages_b0     | v_vmart_node0003 | coldstorage
 messages_b1     | v_vmart_node0003 | coldstorage
 messages_b0     | v_vmart_node0001 | coldstorage
 messages_b1     | v_vmart_node0002 | coldstorage
 messages_b1     | v_vmart_node0001 | coldstorage
(6 rows)

Moving Partitions to a Table Stored on HDFS

Another method of moving partitions from hot storage to cold storage is to move the partitions' data to a separate table in the other storage location. This method breaks the data into two tables, one containing hot data and the other containing cold data. Use this method if you want to prevent queries from inadvertently accessing data stored in cold storage. To query the older data, you must explicitly query the cold table.

To move partitions:

  1. Create a new table whose schema matches that of the existing partitioned table.
  2. Set the storage policy of the new table to use the HDFS-based storage location.
  3. Use the MOVE_PARTITIONS_TO_TABLE function to move a range of partitions from the hot table to the cold table. The partitions migrate when the Tuple Mover next runs.

The following example demonstrates these steps. You first create a table named cold_messages. You then assign it the HDFS-based storage location named coldstorage, and, finally, move a range of partitions.

=> CREATE TABLE cold_messages LIKE messages INCLUDING PROJECTIONS;
=> SELECT SET_OBJECT_STORAGE_POLICY('cold_messages', 'coldstorage');
=> SELECT MOVE_PARTITIONS_TO_TABLE('messages','201309','201403','cold_messages');