Implement Locales for International Data Sets

Locale specifies the user's language, country, and any special variant preferences, such as collation. Vertica uses locale to determine the behavior of certain string functions. Locale also determines the collation for various SQL commands that require ordering and comparison, such as aggregate GROUP BY and ORDER BY clauses, joins, and the analytic ORDER BY clause.

The default locale for a Vertica database is en_US@collation=binary (English US). You can define a new default locale that is used for all sessions on the database. You can also override the locale for individual sessions. However, projections are always collated using the default en_US@collation=binary collation, regardless of the session collation. Any locale-specific collation is applied at query time.

If you set the locale to null, Vertica sets the locale to en_US_POSIX. You can set the locale back to the default locale and collation by issuing the vsql meta-command \locale. For example:

=> set locale to '';
INFO 2567:  Canonical locale: 'en_US_POSIX'
Standard collation: 'LEN'
English (United States, Computer)
SET
=> \locale en_US@collation=binary;
INFO 2567:  Canonical locale: 'en_US'
Standard collation: 'LEN_KBINARY'
English (United States)
=>  \locale
en_US@collation-binary;

You can set locale through ODBC, JDBC, and ADO.net.

ICU Locale Support

Vertica uses the ICU library for locale support; you must specify locale using the ICU locale syntax. The locale used by the database session is not derived from the operating system (through the LANG variable), so Vertica recommends that you set the LANG for each node running vsql, as described in the next section.

While ICU library services can specify collation, currency, and calendar preferences, Vertica supports only the collation component. Any keywords not relating to collation are rejected. Projections are always collated using the en_US@collation=binary collation regardless of the session collation. Any locale-specific collation is applied at query time.

The SET DATESTYLE TO ... command provides some aspects of the calendar, but Vertica supports only dollars as currency.

Changing DB Locale for a Session

This examples sets the session locale to Thai.

  1. At the operating-system level for each node running vsql, set the LANG variable to the locale language as follows:

    export LANG=th_TH.UTF-8

    If setting the LANG= as shown does not work, the operating system support for locales may not be installed.

  2. For each Vertica session (from ODBC/JDBC or vsql) set the language locale.

    From vsql:

    \locale th_TH 
  3. From ODBC/JDBC:

    "SET LOCALE TO th_TH;" 
  4. In PUTTY (or ssh terminal), change the settings as follows:

    settings > window > translation > UTF-8
  5. Click Apply and then click Save.

All data loaded must be in UTF-8 format, not an ISO format, as described in Delimited Data. Character sets like ISO 8859-1 (Latin1), which are incompatible with UTF-8, are not supported, so functions like SUBSTRING do not work correctly for multibyte characters. Thus, settings for locale should not work correctly. If the translation setting ISO-8859-11:2001 (Latin/Thai) works, the data is loaded incorrectly. To convert data correctly, use a utility program such as Linux iconv.

The maximum length parameter for VARCHAR and CHAR data type refers to the number of octets (bytes) that can be stored in that field, not the number of characters. When using multi-byte UTF-8 characters, make sure to size fields to accommodate from 1 to 4 bytes per character, depending on the data.