Implement Locales for International Data Sets
Locale specifies the user's language, country, and any special variant preferences, such as collation. Vertica uses locale to determine the behavior of certain string functions. Locale also determines the collation for various SQL commands that require ordering and comparison, such as aggregate
GROUP BY and
ORDER BY clauses, joins, and the analytic
ORDER BY clause.
The default locale for a Vertica database is
en_US@collation=binary (English US). You can define a new default locale that is used for all sessions on the database. You can also override the locale for individual sessions. However, projections are always collated using the default
en_US@collation=binary collation, regardless of the session collation. Any locale-specific collation is applied at query time.
If you set the locale to null, Vertica sets the locale to
en_US_POSIX. You can set the locale back to the default locale and collation by issuing the vsql meta-command
\locale. For example:
=> set locale to ''; INFO 2567: Canonical locale: 'en_US_POSIX' Standard collation: 'LEN' English (United States, Computer) SET => \locale en_US@collation=binary; INFO 2567: Canonical locale: 'en_US' Standard collation: 'LEN_KBINARY' English (United States) => \locale en_US@collation-binary;
ICU Locale Support
Vertica uses the ICU library for locale support; you must specify locale using the ICU locale syntax. The locale used by the database session is not derived from the operating system (through the
LANG variable), so Vertica recommends that you set the
LANG for each node running vsql, as described in the next section.
While ICU library services can specify collation, currency, and calendar preferences, Vertica supports only the collation component. Any keywords not relating to collation are rejected. Projections are always collated using the
en_US@collation=binary collation regardless of the session collation. Any locale-specific collation is applied at query time.
SET DATESTYLE TO
... command provides some aspects of the calendar, but Vertica supports only dollars as currency.
Changing DB Locale for a Session
This examples sets the session locale to Thai.
At the operating-system level for each node running vsql, set the
LANGvariable to the locale language as follows:
If setting the
LANG=as shown does not work, the operating system support for locales may not be installed.
For each Vertica session (from ODBC/JDBC or vsql) set the language locale.
"SET LOCALE TO th_TH;"
In PUTTY (or ssh terminal), change the settings as follows:
settings > window > translation > UTF-8
- Click Apply and then click Save.
All data loaded must be in UTF-8 format, not an ISO format, as described in Loading Delimited Data. Character sets like ISO 8859-1 (Latin1), which are incompatible with UTF-8, are not supported, so functions like SUBSTRING do not work correctly for multibyte characters. Thus, settings for locale should not work correctly. If the translation setting ISO-8859-11:2001 (Latin/Thai) works, the data is loaded incorrectly. To convert data correctly, use a utility program such as Linux
The maximum length parameter for VARCHAR and CHAR data type refers to the number of octets (bytes) that can be stored in that field, not the number of characters. When using multi-byte UTF-8 characters, make sure to size fields to accommodate from 1 to 4 bytes per character, depending on the data.