How to recognize strings containing nbsp

Posted August 29, 2018 by Phil Molea, Sr. Information Developer, Vertica

Three 3D arrows, different colors pointing in different directions
Maurizio Felici authored this post. Word Processors and Web forms often use the non-breaking space character (nbsp) to prevent line breaks. This character looks the same as a normal space, but uses a different Unicode character with its own UTF-8 encoding. If you inadvertently cut and paste an nbsp character in your database strings, any search criteria using normal spaces will not locate strings containing an nbsp character. This quick tip demonstrates how to recognize strings containing nbsp, as well as how to replace this character with a “normal” space. In the following sample table the row with ID=3 contains an nbsp space (Unicode 00A0): DROP TABLE IF EXISTS public.mytable; CREATE TABLE public.mytable(id, txt) AS SELECT 1, 'My String' UNION ALL SELECT 2, 'My Other String' UNION ALL SELECT 3, U&'My nbsp\00A0String' UNION ALL SELECT 4, 'My Last String' ; At a first sight, the row with ID 3 looks similar to the other rows: SELECT * FROM public.mytable; id | txt ----+----------------- 1 | My String 2 | My Other String 3 | My nbsp String 4 | My Last String (4 rows) But a search criteria containing normal spaces won’t locate the row with ID 3: SELECT * FROM public.mytable WHERE txt LIKE '% String'; id | txt ----+----------------- 1 | My String 2 | My Other String 4 | My Last String (3 rows) Run the following command to determine if a column contains an nbsp character: SELECT id, txt, ISUTF8(txt), CASE WHEN INSTR(txt, U&'\00A0') > 0 THEN 'Yes' ELSE 'No' END AS Contains_nbsp FROM public.mytable ; id | txt | ISUTF8 | Contains_nbsp ----+-----------------+--------+--------------- 1 | My String | t | No 2 | My Other String | t | No 3 | My nbsp String | t | Yes 4 | My Last String | t | No (4 rows) To replace the nbsp character with a normal space you can either UPDATE the original table or create a new one. The following example shows how to create a new table and replace the nbsp character: DROP TABLE IF EXISTS public.mynewtable; CREATE TABLE public.mynewtable AS /* +direct */ SELECT id, REPLACE(txt, U&'\00A0', ' ') AS txt FROM public.mytable ;