Ask any CIO what their top priorities are, and cloud deployment is likely to be at the top of the list. While the reasons for deploying internal applications on the cloud is beyond the scope of this post, it is valid to ask why private Cloud deployment is a viable option for a Big Data implementation, and what impacts it will have on the deployment, maintenance and performance of the system.
A strong argument for Private Clouds is the savings in time and capital provided by consolidating all applications on to a single, industry-standard server configuration. This then enables fast procurement procedures, and decreases the time to scale out the infrastructure.
- Vertica runs on industry-standard x86 hardware, and works with all DAS, SAN and NAS solutions in the marketplace.
- Vertica is a massively parallel processing (MPP) database, scaling out horizontally through the addition of virtual servers rather than increased hardware per virtual server.
All modern virtualization frameworks provide the ability to quickly deploy a pre-configured VM, or template, into the system. This template encodes the results of tuning, security audits and vendor best practices, ensuring that the new virtual server will work seamlessly in the new environment, and reduce support costs for both the customer and the supplying vendor.
- Vertica can assist you with building your own templates.
- Every node in a Vertica database provides the same functionality, so only one template is needed.
- The Vertica database will remotely install on new nodes, rebalance the data throughout the cluster, and bring the new nodes on-line automatically when they are ready.
Another benefit of virtualization is the ease of maintenance. Is a server sending warning signals? Migrate its workload to another virtual server, pull the malfunctioning hardware, and replace it with new hardware.
In addition to the built-in migration services provided by virtualization vendors, Vertica provides simple migration facilities to replace a faulty node with a fresh node. Because Vertica does not use specialized nodes, any available virtual server in the server pool can be used.
There is no free lunch, and the price for improvements in procurement, deployment and maintenance is slower execution on a given hardware configuration. Most cloud deployments see between 15-30% degradation, depending on the application’s profile.
HP Vertica was built for virtualization. Virtualization’s weaknesses are augmented by Vertica’s strengths. For example, one of the weaknesses of a virtual infrastructure is reduced I/O compared to large DAS arrays. Vertica employs aggressive compression routines to minimize the size of the data on-disk, greatly reducing the I/O requirements of the storage network.
Columnar databases have a natural I/O advantage. In a column store, data for each column in a table is stored separately, so only the data needed to answer the question must be scanned, rather than the full row. Especially with wide tables, Vertica only needs to materialize columns specified in the query.
Due to Vertica’s unique architecture, Vertica is CPU-bound, rather than memory or I/O. Most virtual infrastructures are compute-heavy, a perfect match for Vertica.
Vertica can assist you with building your own templates. We can provide best practices, health checks, and other services to ensure that your configuration is optimized and fully supported.
How Does Vertica Enhance Private Cloud Deployments?
Vertica offers additional improvements for cloud deployments above and beyond those provided by your virtualization product.
Elastic Cluster—You can scale your cluster up or down to meet the needs of your database. The most common case is to add nodes to your database cluster to accommodate more data and provide better query performance. However, you can scale down your cluster if you find that it is overprovisioned or if you need to divert hardware for other uses. Visit our online documentation for additional information on Elastic Clusters.
Tiered Storage Support—Most virtual infrastructures make use of storage pools. The idea is to have pools of disks for different workload profiles: SSDs or fast hard drives for high-performance applications, and slower disks for less critical workloads. Visit our online documentation for additional information on Storage Locations.
Fast Backup and Restore—Vertica stores data in highly compressed files on disk. When doing a backup or restore, Vertica moves these highly compressed files over the network to the backup storage location. This provides an immense reduction in bandwidth on the storage networks. Visit our online documentation for additional information on Vertica’s backup and recovery features.
Fast Data Copying—To make these activities simple and fast, Vertica employs the same mechanisms for moving tables between databases as it does for backup and recovery: highly compressed data files are copied between the databases. Each node in the Vertica cluster sends copies of its data to the remote database in parallel, enabling movement of several terabytes per minute in large clusters. Visit our online documentation for additional information on fast data copy.
While slower performance may hinder some cloud based deployments, the HP Vertica Analytics Platform implements a number of design features and architectural decisions that complement today’s private cloud environments. Learn more about how HP Vertica handles data faster and more reliably than any other database within public and virtualized enterprise cloud environments.