High Availability with Fault Groups

Use fault groups to reduce the risk of correlated failures inherent in your physical environment. Correlated failures occur when two or more nodes fail as a result of a single failure. For example, such failures can occur due to problems with shared resources such as power loss, networking issues, or storage.

Vertica minimizes the risk of correlated failures by letting you define fault groups on your cluster. Vertica then uses the fault groups to distribute data segments across the cluster, so the database continues running if a single failure event occurs.

If your cluster layout is managed by a single network switch, a switch failure would cause a single point of failure. Fault groups cannot help with single-point failures.

Vertica supports complex, hierarchical fault groups of different shapes and sizes. You can integrate fault groups with elastic cluster and large cluster arrangements to add cluster flexibility and reliability.

Making Vertica Aware of Cluster Topology with Fault Groups

You can also use fault groups to make Vertica aware of the topology of the cluster on which your Vertica database is running. Making Vertica aware of your cluster's topology is required when using Terrace Routing.

Terrace routing is a feature that can reduce the buffer requirements of large queries. Use terrace routing in situations where you have large queries and clusters with a large number of nodes. Without terrace routing, these situations would otherwise require excessive buffer space.

For more information about Terrace Routing, see Terrace Routing.

Automatic Fault Groups

When you configure a cluster of 120 nodes or more, Vertica automatically creates fault groups around control nodes. Control nodes are a subset of cluster nodes that manage spread (control messaging). Vertica places nodes that share a control node in the same fault group. See Large Cluster in the Administrator's Guide for details.

User-Defined Fault Groups

Define your own default groups if:

  • Your cluster layout has the potential for correlated failures.
  • You want to influence which cluster hosts manage control messaging.

Example Cluster Topology

The following diagram provides an example of hierarchical fault groups configured on a single cluster:

  • Fault group FG–A contains nodes only.
  • Fault group FG-B (parent) contains child fault groups FG-C and FG-D. Each child fault group also contain nodes.

  • Fault group FG–E (parent) contains child fault groups FG-F and FG-G. The parent fault group FG–E also contains nodes.

How to Create Fault Groups

Before you define fault groups, you must have a thorough knowledge of your physical cluster layout. Fault groups require careful planning.

To define fault groups, create an input file of your cluster arrangement. Then, pass the file to a script supplied by Vertica, and the script returns the SQL statements you need to run. See Fault Groups in the Administrator's Guide for details.