Master Blog Series: Getting Started with Vertica

Posted August 20, 2018 by Soniya Shah, Information Developer

This post was authored by Soniya Shah. Are you a new Vertica user? If so, you’re probably wondering where to start. We’re here to help you on your big data analytics journey, from understanding Vertica terminology to making the most of your resources. If you find yourself asking questions like What does the Tuple Mover do? or Why are projections useful? , this blog is for you. This blog is intended to focus on documents our team has written that focus on a basic understanding of Vertica architecture concepts, along with how you can implement and use features that accompany those concepts. Understanding Vertica Epochs. This document explains what epochs – logical timestamps for data in Vertica – are and how they work. Vertica has a few different types of epochs, each of which are important in marking when operations happened in your Vertica database. Read this document to better understand how the COMMIT of DML transactions (INSERT, UPDATE, MERGE, COPY, and DELETE) affects how epochs move in Vertica. Best Practices for Projection Optimization. We’ve already mentioned projections in this article. But what are they? And how does Vertica work with projections? All these questions and more are answered in this document, which provides helpful tips and tasks about the ins and outs of projections. Plus, you’ll read more about the interaction between projections and epochs! Tuple Mover Best Practices describes what the WOS and ROS are and explains the mergeout and moveout operations in detail, with helpful illustrations along the way. Understand how to troubleshoot issues, get your questions answered, and learn more about the best practices of the Tuple Mover and all its components. Deletes in Vertica: The FAQs gives you the rundown of the most common questions we get about Vertica deletes. This document starts off with the basics, explaining concepts like the AHM, replay deletes, and delete vectors. This section is great for understanding how Vertica handles deletes and breaks down the basic jargon around deletes. Some useful diagrams provide a conceptual understanding of the delete lifecycle and how deletes are closely linked to the work of the Tuple Mover. Plus, there’s even more information about projection design consideration and how deletes factor into those considerations. Understanding the Vertica Replay Delete Algorithms provides a more in-depth understanding of the delete lifecycle in Vertica. You’ve probably gotten the sense that projections are pretty important in Vertica. And they are, especially when considering how to design them. This document focuses on that and configuring the Vertica database to improve replay delete performance. We also give you an under the hood look at the algorithms that run behind the replay delete functionality and how Vertica uses those algorithms to optimize your deletes. Looking for more? If you’ve read through our top 5 and are still looking for more Vertica knowledge, don’t worry, we have you covered! We recommend these for users who are feeling more comfortable with Vertica terms and have a basic understanding of the architecture. K-Safety Best Practices goes through an overview of K-safety, data safety, and node dependencies. This is hugely important when considering recovery features in Vertica. K-safety measures fault tolerance in your database cluster. K-safety enforces the projection requirements Vertica uses to make sure your data is safe, even if a node goes down. We highly recommend reading this for an understanding of how you can keep your data safe. Resource Management. Vertica is all about making the most of your resources – we want you to run fast, with the freedom to do what you want with your data to meet your workload needs. All loads and queries that run against the database take up system resources. Sometimes, the way those resources are running could be done more efficiently. Read through this blog post to better understand the Vertica Resource Manager, resource pools, and built-in queries. Plus, we walk you through an example that shows you how to allocate resources for batch loads. Troubleshooting ROS Pushback is a case-based document that goes through 8 detailed scenarios designed to explain each case and provide tangible solutions filled with examples, and real-life situations you may find yourself in. Vertica Partitions: The FAQs explains how partitions can make the data management lifecycle easier and help improve query performance. Drawing on concepts like understanding the Tuple Mover, this document walks through everything from partition basics to repartitioning and reorganizing. Read this to better understand partitions, see visualizations of partitioning, and to manage your data using partitions.