MAKEUTF8

Coerces a string to UTF-8 by removing or replacing non-UTF-8 characters.

MAKEUTF8 flags invalid UTF-8 characters byte by byte. For example, the byte sequence 0xE0 0x7F 0x80 is an invalid three-byte UTF-8 sequence, but the middle byte, 0x7F, is a valid one-byte UTF-8 character. In this example, 0x7F is preserved and the other two bytes are removed or replaced.

Syntax

MAKEUTF8( string‑expression [USING PARAMETERS param=value] );

Arguments

string‑expression

The string expression to evaluate for non-UTF-8 characters

Parameters

replacement_string

Specifies the VARCHAR(16) string that MAKEUTF8 uses to replace each non-UTF-8 character that it finds in string‑expression. If this parameter is omitted, non-UTF-8 characters are removed. For example, the following SQL specifies to replace all non-UTF characters in the name column with the string ^:

=> SELECT MAKEUTF8(name USING PARAMETERS replacement_string='^') FROM people;