Joining Variable Length String Data

When you join tables on VARCHAR columns, Vertica calculates how much storage space it requires to buffer join column data. It does so by formatting the column data in one of two ways:

  • Uses the join column metadata to size column data to a fixed length and buffer accordingly. For example, given a column that is defined as VARCHAR(1000), Vertica always buffers 1000 characters.
  • Uses the actual length of join column data, so buffer size varies for each join. For example, given a join on strings Xi, John, and Amrita, Vertica buffers only as much storage as it needs for each join—in this case, 2, 4, and 6 bytes, respectively.

The second approach can improve join query performance. It can also reduce memory consumption, which helps prevent join spills and minimize how often memory is borrowed from the resource manager. In general, these benefits are especially marked in cases where the defined size of a join column significantly exceeds the average length of its data.

Setting and Verifying Variable Length Formatting

You can control how Vertica implements joins at the session or database levels, through configuration parameter JoinDefaultTupleFormat, or for individual queries, through the JFMT hint. Vertica supports variable length formatting for all joins except merge and event series joins.

Use EXPLAIN VERBOSE to verify whether a given query uses variable character formatting, by checking for these flags:

  • JF_EE_VARIABLE_FORMAT
  • JF_EE_FIXED_FORMAT