MAKEUTF8
Coerces a string to UTF-8 by removing or replacing non-UTF-8 characters.
Flagging of invalid UTF-8 characters is byte by byte. For example, the byte sequence 0xE0 0x7F 0x80
is an invalid three-byte UTF-8 sequence, but the middle byte, 0x7F
, is a valid one-byte UTF-8 character. In this example, 0x7F
is preserved and the other two bytes are removed or replaced.
Syntax
MAKEUTF8( string [USING PARAMETERS param=value] );
Arguments
string
|
The input string, which might contain non-UTF-8 characters |
Parameters
replacement_string
|
The string (VARCHAR(16)) that replaces each non-UTF-8 character in the input string. If this parameter is omitted, non-UTF-8 characters are removed. |
Examples
These examples use a table with the following contents:
=> SELECT * FROM people; name -------- Dáithí Fíona Móirín Róisín Séamus Séan Tiarnán Áine (8 rows)
ISUTF8 returns false for all of these. Calling MAKEUTF8 without a replacement removes the special characters:
=> SELECT MAKEUTF8(name) FROM people; MAKEUTF8 ---------- Dith Fona Mirn Risn Samus San Tiarnn ine (8 rows)
You can instead specify a replacement string:
=> SELECT MAKEUTF8(name USING PARAMETERS replacement_string='^') FROM people; MAKEUTF8 ---------- D^ith^ F^ona M^ir^n R^is^n S^amus S^an Tiarn^n ^ine (8 rows)