MAKEUTF8

Coerces a string to UTF-8 by removing or replacing non-UTF-8 characters.

Flagging of invalid UTF-8 characters is byte by byte. For example, the byte sequence 0xE0 0x7F 0x80 is an invalid three-byte UTF-8 sequence, but the middle byte, 0x7F, is a valid one-byte UTF-8 character. In this example, 0x7F is preserved and the other two bytes are removed or replaced.

Syntax

MAKEUTF8( string [USING PARAMETERS param=value] );

Arguments

string

The input string, which might contain non-UTF-8 characters

Parameters

replacement_string

The string (VARCHAR(16)) that replaces each non-UTF-8 character in the input string. If this parameter is omitted, non-UTF-8 characters are removed.

Examples

These examples use a table with the following contents:

=> SELECT * FROM people;
  name  
--------
 Dáithí
 Fíona
 Móirín
 Róisín
 Séamus
 Séan
 Tiarnán
 Áine
(8 rows)

ISUTF8 returns false for all of these. Calling MAKEUTF8 without a replacement removes the special characters:

=> SELECT MAKEUTF8(name) FROM people;
 MAKEUTF8 
----------
 Dith
 Fona
 Mirn
 Risn
 Samus
 San
 Tiarnn
 ine
(8 rows)

You can instead specify a replacement string:

=> SELECT MAKEUTF8(name USING PARAMETERS replacement_string='^') FROM people;
 MAKEUTF8 
----------
 D^ith^
 F^ona
 M^ir^n
 R^is^n
 S^amus
 S^an
 Tiarn^n
 ^ine
(8 rows)