RAID Storage for Vertica

Storage Options for Vertica

Storage options for Vertica include:

  • Internal hardware storage array
  • SAN (storage area network)
  • NAS (network-attached storage)
  • DAS (direct-attached storage) enclosure

Each storage option should present a single local file system back to each host in a cluster. In addition, the storage must provide sufficient bandwidth according to the guidelines recommended in Vertica Hardware Guide in the Vertica Knowledge Base.

Vertica operates with high concurrency in a scale-out, shared-nothing cluster architecture. While some cluster-based environments can utilize shared storage devices. In the case of Vertica, they can greatly limit performance. 

This document helps you understand the price-performance tradeoffs inherent among the different RAID storage options. Choosing shared storage is an important decision because some devices can affect performance.

RAID Storage

RAID Examples

The following figure highlights the major differences among the RAID 0, RAID 1+0, RAID 5, and RAID 5+0 architectures. Each RAID architecture groups individual disks that presents to the operating system as a single logical volume. All the RAID options except RAID 0 provide a level of disk protection.

RAID is not a form of data backup. RAID increases the availability of a storage array in the event of a disk failure. In any server or storage system, the mechanical moving disks are likely to fail before a CPU or memory chip fails. Make sure to back up your data regularly as described in Backing Up and Restoring the Database in the Vertica documentation.

Selecting RAID Storage for Vertica

Each RAID storage option offers a different balance of cost, performance, and availability. None of the RAID options can optimize all three capabilities. The following table describes each option and what capabilities it does and doesn't provide.

Note Hardware options and choices can be extensive combinations of disks, RAID controllers, servers, and operating systems. Always consult the Resources for Configuring Hardware and Virtual Machines page for the most up-to-date recommendations.

RAID Storage Option Advantages Disadvantages Comments

RAID 0

  • Cost
  • I/O performance
  • No data protection

Do not use RAID 0 for your Vertica cluster. Choose another RAID option.

RAID 1+0

  • Data protection with disk failure
  • Overall performance
  • Disk rebuild times
  • Write performance

RAID 5

  • Data protection with disk failure
  • Usable disk capacity
  • Cost compared to  same capacity drives
  • Write performance
  • Disk rebuild times
  • With a new disk, reduced performance during disk parity rebuild
  • If parity becomes corrupted, slight risk of data corruption
  • Possibility of multiple disk failures in the same RAID set

If you choose RAID 5, you should have a hot spare disk in case you need to replace a failing or failed disk in your RAID set.

RAID 5+0

  • Data protection with disk failure
  • RAID 0 striping
  • Balance of performance, cost, availability, and storage capacity
  • Not supported on all hardware RAID arrays

Hardware or Software RAID?

For your Vertica cluster, you should use hardware RAID using a dedicated controller. Hardware RAID controllers manage a collection of disks and present to the host as a single large volume.

In software RAID, the RAID tasks are part of the operating system and on your CPU..Software RAID adds significant overhead—as high as 40%—to the Vertica host, and therefore is not a good choice for Vertica. In addition, software RAID offers lower performance than dedicated hardware RAID controllers. In a software RAID configuration, the operating system manages each individual disk, which causes additional load on the CPUs.

In virtual environments where the size of the disk devices may be limited, using software RAID may be appropriate.

Recommended RAID settings for Vertica

The next figure shows the recommended sizing and volume configuration for a Vertica node configured as one of the following:

  • HPE ProliantDL380 Gen9 24-SFF Server
  • HPE ProliantDL380 Gen9 24-SFF Server

RAID-10-Vertica_new1.png

Each node contains two internal mirrored disks that hold the operating system, the Vertica software in the /opt directory, and the catalog disk. The operating system disks can be either 300 GB or 600 GB disks, mirrored RAID 1.

The /data volume is typically 6–10 TB in capacity, and is built using up to 24 disks with RAID 1+0. The Vertica database files reside in this area. The I/O performance is important and should be at least 60–80 MB/sec of throughput per CPU core. For example, a 24-core node should deliver 1.4 GB/sec to 1.9 GB/sec of IO read and write throughput.

Cache Setting

For the HPE P840 PCIe card, set the read cache to 10% and set the write cache to 90%. For other types of storage, always run vioperf, the Vertica I/O performance test, to validate each configuration.

Block Size

Large block I/O sequential reads with a 256 KB to 512 KB block size work best. Small block sizes like 8–32 KB work well for transactional workloads, but are not suitable for a Vertica database.

For More Information

Vertica Best Practices Documents

For additional Vertica best practices documents, visit our Big Data and Analytics Knowledge Base.