Vertica Analytics Platform Version 10.1.x Documentation

ISUTF8

Tests whether a string is a valid UTF-8 string. Returns true if the string conforms to UTF-8 standards, and false otherwise. This function is useful to test strings for UTF-8 compliance before passing them to one of the regular expression functions, such as REGEXP_LIKE, which expect UTF-8 characters by default.

ISUTF8 checks for invalid UTF8 byte sequences, according to UTF-8 rules:

  • invalid bytes
  • an unexpected continuation byte
  • a start byte not followed by enough continuation bytes
  • an Overload Encoding

The presence of an invalid UTF-8 byte sequence results in a return value of false.

To coerce a string to UTF-8, use MAKEUTF8.

Syntax

ISUTF8( string );

Parameters

string

The string to test for UTF-8 compliance.

Examples

=> SELECT ISUTF8(E'\xC2\xBF'); -- UTF-8 INVERTED QUESTION MARK ISUTF8 
--------
 t
(1 row)
=> SELECT ISUTF8(E'\xC2\xC0'); -- UNDEFINED UTF-8 CHARACTER
 ISUTF8 
--------
 f
(1 row)