Duplicating a Subcluster

Subclusters have many settings you can tune to get them to work just the way you want. After you have tuned a subcluster, you may want additional subclusters that are configured the same way. For example, suppose you have a subcluster that you have tuned to perform analytics workloads. To improve query throughput, you can create several more subclusters configured exactly like it. Instead of creating the new subclusters and then manually configuring them from scratch, you can duplicate the existing subcluster (called the source subcluster) to a new subcluster (the target subcluster).

When you create a new subcluster based on another subcluster, Vertica copies most of the source subcluster's settings. See below for a list of the settings that Vertica copies. These settings are both on the node level and the subcluster level.

After you duplicate a subcluster, the target is not connected to the source in any way. Any changes you make to the source subcluster's settings after duplication are not copied to the target. The subclusters are completely independent after duplication.

Requirements for the Target Subcluster

You must have a set of hosts in your database cluster that you will use as the target of the subcluster duplication. Vertica forms these hosts into a target subcluster that receives most of the settings of the source subcluster. The hosts for the target subcluster must meet the following requirements:

  • They must be part of your database cluster but not part of your database. For example, you can use hosts you have dropped from a subcluster or whose subcluster you have removed. Vertica returns an error if you attempt to duplicate a subcluster onto one or more nodes that are currently participating in the database.

    If you want to duplicate the settings of a subcluster to another subcluster, remove the target subcluster (see Removing Subclusters). Then duplicate the source subcluster onto the hosts of the now-removed target subcluster.

  • The number of nodes you supply for the target subcluster must equal the number of nodes in the source subcluster. When duplicating the subcluster, Vertica performs a 1:1 copy of some node-level settings from each node in the source subcluster to a corresponding node in the target.
  • The RAM and disk allocation for the hosts in the target subcluster should be at least the same as the source nodes. Technically, your target nodes can have less RAM or disk space than the source nodes. However, you will usually see performance issues in the new subcluster because the settings of the original subcluster will not be tuned for the resources of the target subcluster.

You can duplicate a subcluster even if some of the nodes in the source subcluster or hosts in the target are down. If nodes in the target are down, they use the catalog Vertica copied from the source node when they recover.

Duplication of Subcluster-level Settings

The following table lists the subcluster-level settings that Vertica copies from the source subcluster to the target.

Setting Type Setting Details
Basic subcluster settings Whether the subcluster is a primary or secondary subcluster.
Large cluster settings The number of control nodes in the subcluster.
Resource pool settings
  • Vertica creates a new resource pool for every subcluster-specific resource pool in the source subcluster.

    Duplicating a subcluster can fail due to subcluster-specific resource pools. If creating the subcluster-specific resource pools leave less than 25% of the total memory free for the general pool, Vertica stops the duplication and reports an error.

  • Subcluster-specific resource pool cascade settings are copied from the source subcluster and are applied to the newly-created resource pool for the target subcluster.
  • Subcluster-level overrides on global resource pools settings such as MEMORYSIZE. See Managing Workload Resources in an Eon Mode Database for more information.
  • Grants on resource pools are copied from the source subcluster.
Connection load balancing settings

If the source subcluster is part of a subcluster-based load balancing group (you created the load balancing group using CREATE LOAD BALANCE GROUP...WITH SUBCLUSTER) the new subcluster is added to the group. See Creating Connection Load Balance Groups

Vertica adds the new subcluster to the subcluster-based load balancing group. However, it does not create network addresses for the nodes in the target subcluster. Load balancing policies cannot direct connections to the new subcluster until you create network addresses for the nodes in the target subcluster. See Creating Network Addresses for the steps you must take.

Storage policy settings

Table and table partition pinning policies are copied from the source to the target subcluster. See Pinning Depot Objects for more information. Any existing storage policies on the target subcluster are dropped before the policies are copied from the source.

Vertica does not copy the following subcluster settings:

Setting Type Setting Details
Basic subcluster settings
  • Subcluster name (you must provide a new name for the target subcluster).
  • If the source is the default subcluster, the setting is not copied to the target. Your Vertica database has a single default subcluster. If Vertica copied this value, the source subcluster could no longer be the default.
Connection load balancing settings

Address-based load balancing groups are not duplicated for the target subcluster.

For example, suppose you created a load balancing group for the source subcluster by adding the network addresses of all subcluster's nodes . In this case, Vertica does not create a load balancing group for the target subcluster because it does not duplicate the network addresses of the source nodes (see the next section). Because it does not copy the addresses, it cannot not create an address-based group.

Duplication of Node-level Settings

When Vertica duplicates a subcluster, it maps each node in the source subcluster to a node in the destination subcluster. Then it copies relevant node-level settings from each individual source node to the corresponding target node.

For example, suppose you have a three-node subcluster consisting of nodes named node01, node02, and node03. The target subcluster has nodes named node04, node05, and node06. In this case, Vertica copies the settings from node01 to node04, from node02 to node05, and from node03 to node06.

The node-level settings that Vertica copies from the source nodes to the target nodes are:

Setting Type Setting Details
Configuration parameters

Vertica copies the value of configuration parameters that you have set at the node level in the source node to the target node. For example, suppose you set CompressCatalogOnDisk on the source node using the statement:

ALTER NODE node01 SET CompressCatalogOnDisk = 0;

If you then duplicated the subcluster containing node01, the setting is copied to the target node.

Eon Mode settings
  • Shard subscriptions are copied from the source node to the target.
  • Whether the node is the participating primary node for the shard.
Storage location settings

The DATA, TEMP, DEPOT, and USER storage location paths on the source node are duplicated on the target node. When duplicating node-specific paths (such as DATA or DEPOT) the path names are adjusted for the new node name. For example, suppose node 1 has a depot path of /vertica/depot/vmart/v_vmart_node0001_depot. If Vertica duplicates node 1 to node 4, it adjusts the path to /vertica/depot/vmart/v_vmart_node0004_depot.

The directories for these storage locations on the target node must be empty. They must also have the correct file permissions to allow Vertica to read and write to them.

Vertica does not duplicate a storage location if it cannot access its directory on the target node or if the directory is not empty. In this case, the target node will not have the location defined after the duplication process finishes. Admintools does not warn you if any locations were not duplicated.

If you find that storage locations have not been duplicated on one or more target nodes, you must fix the issues with the directories on the target nodes. Then re-run the duplication command.

Large cluster settings

Control node assignments are copied from the source node to the target node:

  • If the source node is a control node, then the target node is made into a control node.
  • If the source node depends on a control node, then the target node becomes a dependent of the corresponding control node in the new subcluster.

Vertica does not copy the following node-level settings:

Setting Type Setting Details
Connection load balancing settings

Network Addresses are not copied. The destination node's network addresses do not depend on the settings of the source node. Therefore, Vertica cannot determine what the target node's addresses should be.

Depot settings Depot-related configuration parameters that can be set on a node level (such as FileDeletionServiceInterval) are not copied from the source node to the target node.

Using admintools to Duplicate a Subcluster

To duplicate a subcluster, you use the same admintools db_add_subcluster tool that you use to create a new subcluster (see Creating Subclusters). In addition to the required options to create a subcluster (the list of hosts, name for the new subcluster, database name, and so on), you also pass the --like option with the name of the source subcluster you want to duplicate.

When you use the --like option, you cannot use the --is-secondary or --control-set-size options. Vertica determines whether the new subcluster is secondary and the number of control nodes it contains based on the source subcluster. If you supply these options along with the --like option, admintools returns an error.

The following examples demonstrate duplicating a three-node subcluster named analytics_1. The first example examines some of the settings in the analytics_1 subcluster:

  • An override of the global TM resource pool's memory size.
  • Its own resource pool named analytics
  • Its membership in a subcluster-based load balancing group named analytics
=> SELECT name, subcluster_name, memorysize FROM SUBCLUSTER_RESOURCE_POOL_OVERRIDES;
 name | subcluster_name | memorysize 
------+-----------------+------------
 tm   | analytics_1     | 0%
(1 row)

=> SELECT name, subcluster_name, memorysize, plannedconcurrency 
      FROM resource_pools WHERE subcluster_name IS NOT NULL;
      name      | subcluster_name | memorysize | plannedconcurrency 
----------------+-----------------+------------+--------------------
 analytics_pool | analytics_1     | 70%        | 8
(1 row)

=> SELECT * FROM LOAD_BALANCE_GROUPS;
   name    |   policy   |  filter   |    type    | object_name 
-----------+------------+-----------+------------+-------------
 analytics | ROUNDROBIN | 0.0.0.0/0 | Subcluster | analytics_1
(1 row)

The following example calls admintool's db_add_subcluster tool to duplicate the analytics_1 subcluster onto a set of three hosts to create a subcluster named analytics_2.

$ admintools -t db_add_subcluster -d verticadb \
             -s 10.11.12.13,10.11.12.14,10.11.12.15 \
	      -p mypassword --like=analytics_1 -c analytics_2
	  
Creating new subcluster 'analytics_2'
Adding new hosts to 'analytics_2'
Eon database detected, creating new depot locations for newly added nodes
Creating depot locations for 1 nodes
 Warning when creating depot location for node: v_verticadb_node0007 
 WARNING: Target node v_verticadb_node0007 is down, so depot size has been 
          estimated from depot location on initiator. As soon as the node comes 
		  up, its depot size might be altered depending on its disk size
Eon database detected, creating new depot locations for newly added nodes
Creating depot locations for 1 nodes
 Warning when creating depot location for node: v_verticadb_node0008 
 WARNING: Target node v_verticadb_node0008 is down, so depot size has been 
          estimated from depot location on initiator. As soon as the node comes
		  up, its depot size might be altered depending on its disk size
Eon database detected, creating new depot locations for newly added nodes
Creating depot locations for 1 nodes
 Warning when creating depot location for node: v_verticadb_node0009 
 WARNING: Target node v_verticadb_node0009 is down, so depot size has been 
          estimated from depot location on initiator. As soon as the node comes 
		  up, its depot size might be altered depending on its disk size
Cloning subcluster properties
NOTICE: Nodes in subcluster analytics_1 have network addresses, you 
might need to configure network addresses for nodes in subcluster 
analytics_2 in order to get load balance groups to work correctly. 

	Replicating configuration to all nodes
	Generating new configuration information and reloading spread
	Starting nodes: 
		v_verticadb_node0007 (10.11.12.81)
		v_verticadb_node0008 (10.11.12.209)
		v_verticadb_node0009 (10.11.12.186)
	Starting Vertica on all nodes. Please wait, databases with a large catalog 
	     may take a while to initialize.
	Checking database state for newly added nodes
	Node Status: v_verticadb_node0007: (DOWN) v_verticadb_node0008: 
	             (DOWN) v_verticadb_node0009: (DOWN) 
	Node Status: v_verticadb_node0007: (INITIALIZING) v_verticadb_node0008: 
	             (INITIALIZING) v_verticadb_node0009: (INITIALIZING) 
	Node Status: v_verticadb_node0007: (UP) v_verticadb_node0008: 
	             (UP) v_verticadb_node0009: (UP) 
Syncing catalog on verticadb with 2000 attempts.
	Multi-node DB add completed
Nodes added to subcluster analytics_2 successfully.
Subcluster added to verticadb successfully.

Re-running the queries in the first part of the example shows that the settings from analytics_1 have been duplicated in analytics_2:

=> SELECT name, subcluster_name, memorysize FROM SUBCLUSTER_RESOURCE_POOL_OVERRIDES;
 name | subcluster_name | memorysize 
------+-----------------+------------
 tm   | analytics_1     | 0%
 tm   | analytics_2     | 0%
(2 rows)

=> SELECT name, subcluster_name, memorysize, plannedconcurrency 
       FROM resource_pools WHERE subcluster_name IS NOT NULL;
      name      | subcluster_name | memorysize |  plannedconcurrency 
----------------+-----------------+------------+--------------------
 analytics_pool | analytics_1     | 70%        | 8
 analytics_pool | analytics_2     | 70%        | 8
(2 rows)

=> SELECT * FROM LOAD_BALANCE_GROUPS;
   name    |   policy   |  filter   |    type    | object_name 
-----------+------------+-----------+------------+-------------
 analytics | ROUNDROBIN | 0.0.0.0/0 | Subcluster | analytics_2
 analytics | ROUNDROBIN | 0.0.0.0/0 | Subcluster | analytics_1
(2 rows)

As noted earlier, even though analytics_2 subcluster is part of the analytics load balancing group, its nodes do not have network addresses defined for them. Until you define network addresses for the nodes, Vertica cannot redirect client connections to them.