Understanding ROS and WOS: A Hybrid Data Storage Model

Posted October 7, 2015 by Sarah Lemaire, Manager, Vertica Documentation

Modern Database Analytics

Vertica is unique in many ways, one of which can be seen in its data storage model. In this blog, we’ll go over each storage component and show you how to take advantage of Vertica’s storage options to optimize your database for writing and reading.

Vertica storage model

To understand the Vertica storage model, you first need to understand these three elements:

  • ROS (Read-Optimized Store)
  • WOS (Write-Optimized Store)
  • Tuple Mover

WOS and ROS

Vertica uses two distinct structures for storing data: ROS and WOS. By having two storage structures, Vertica can be optimized for both loads (writes) and reads.

WOS – Write Optimized Store

In Vertica, WOS stands for Write Optimized Store. The WOS is an in-memory data storage structure optimized for low-latency data loading. It is designed to efficiently support INSERT, UPDATE, DELETE, and most COPY operations.

When you load data, it first goes into WOS (unless you use the COPY DIRECT statement, discussed later). Records in WOS are stored without compression or indexing to support faster loading. Because the data in WOS is sorted only when queried, it is not optimized for reading.

ROS – Read Optimized Store

ROS on the other hand, is structured for fast reads. ROS stands for Read Optimized Store. It’s a highly optimized, read-oriented, disk storage structure. ROS data is partitioned into sections called storage containers. A container is just a set of rows created by moveout or COPY DIRECT statements and stored in a particular group of files. The bulk of the data in your database lives in ROS and is both sorted and compressed. Because of this storage structure, data in ROS is indeed optimized for reads.

Tuple Mover

Vertica moves data from WOS to ROS using the Tuple Mover.

The Tuple Mover performs two operations:

  • Moveout
  • Mergeout

Moveout

During moveout operations, the Tuple Mover compresses and sorts the data in WOS and then moves it to ROS, creating new ROS containers for the new data. The ROS container setup allows for faster movement of data from WOS to ROS because newer data doesn’t need to be merged with existing ROS data immediately.

Mergeout

During mergeout, the Tuple Mover combines, or merges, the small ROS containers that were created by moveout operations or COPY DIRECT statements into larger ones. It also purges data that is marked for deletion.

The Tuple Mover runs in the background, performing some tasks automatically at time intervals determined by its configuration parameters.

Data movement

You can get data into your database using any of these three methods:

  • Load data into WOS and let the Tuple Mover move it to ROS
  • Load data directly into ROS using the DIRECT option
  • Trickle load data only into WOS

 

When you use the COPY, INSERT, or UPDATE statement, data is, by default, loaded into WOS first. At a time interval that you specify, Tuple Mover performs a moveout operation and moves the data from WOS to ROS. Later, during a mergeout operation, the Tuple Mover merges ROS containers and purges data marked for deletion. You can manually load data directly into ROS if you want using the COPY DIRECT, INSERT DIRECT, or UPDATE DIRECT options.

How can I use these elements to my advantage?

So how do you tell which load method is right for you? Read through these scenarios to learn how to optimize loading for your data loading use case.

Scenario: You’re loading a large amount of data

When loading large batches of data, you’ll want to load that data directly into ROS. Direct loading avoids the WOS entirely and provides the highest load rate. To load data directly into ROS, use the DIRECT keyword with COPY, INSERT, or UPDATE. For example:

=> COPY a FROM stdin DELIMITER ',' DIRECT;

Scenario: You’re loading lots of small data and have outside buffering

Take advantage of the WOS and Tuple Mover if you are loading data incrementally. If you have a solid outside buffering system in place, your best bet is to use the COPY TRICKLE statement to load the data incrementally. For example,

=> COPY public.customer_dimension (customer_since FORMAT 'YYYY')
FROM stdin DELIMITER ',' TRICKLE;

Trickle loading loads data into the WOS offers lower latency and resource consumption than loading to ROS. However, if the WOS becomes full, an error occurs and the entire data load is rolled back. Loading Vertica too fast will result in the outside buffering solution holding the data until Vertica is ready to receive it. The buffered data is not query-able, but neither is it forcing Vertica to consume resources managing it. Use this option only when you have a finely-tuned load and moveout process. We’ll go over configuration options in the next section.

Scenario: You’re loading lots of small data with no outside buffering

You use the COPY AUTO option if you’re loading a lot of tiny chucks of data but perhaps don”t have use good outside buffering. For example,

=> COPY public.customer_dimension (customer_since FORMAT 'YYYY')
FROM stdin DELIMITER ',' AUTO;

Vertica uses AUTO mode by default when you use the COPY statement so the AUTO keyword is optional. When data is loaded using this method, Vertica starts loading into the WOS, spilling to ROS if the WOS becomes full.

Configuration options

In addition to choosing a load method, you can make further configurations to optimize loading using the Vertica built-in pools.

Option Description Example
Adjust WOSDATA PLANNEDCONCURRENCY The WOSDATA pool’s PLANNEDCONCURRENCY tells Vertica how to share available WOS between concurrent loads. => ALTER RESOURCE POOL WOSDATA PLANNEDCONCURRENCY 3;

 

Once a load has occupied one quarter (1/(3+1)) of the WOS, it spills to the ROS.

Customize the Tuple Mover (TM) resource pool If you notice spikes in the number of ROS containers, you can increase the MAXCONCURRENCY parameter above.

 

ALTER RESOURCE POOL TM MAXCONCURRENCY 3;

 

The TM can run 2 mergeout threads, so if a large mergeout is in progress, smaller ROS containers can be merged.

Alter the MoveOutInterval This parameter tells the Tupler Mover when to check for new data in the WOS to move to ROS. ALTER DATABASE mydb SET MoveOutInterval = 60;

 

The TM will check for new data every 60 seconds (1 minutes).

 

Monitoring ROS and WOS

You can monitor information about WOS and ROS by viewing the STORAGE_CONTAINERS system table:hybrid2

The Vertica hybrid data model allows for optimization of both data reads and writes. You can use different loading methods or configurations to tune the model for your business needs. For more information about ROS and WOS, check out our documentation.