Ignore NULL Values in Non-Correlated Subqueries: Quick Tip

Posted January 10, 2019 by James Knicely, Vertica Field Chief Technologist

A subquery is a SELECT statement embedded within another SELECT statement. The embedded subquery is often referenced as the query’s inner statement, while the containing query is typically referenced as the query’s statement, or outer query block. A subquery returns data that the outer query uses as a condition to determine what data to retrieve.

A non-correlated (simple) subquery obtains its results independently of its containing (outer) statement.

Example: dbadmin=> SELECT * FROM tab1; c1 | c2 ---+---- 1 | A 2 | B (2 rows) dbadmin=> SELECT * FROM tab2; c2 ---- B (1 row) dbadmin=> SELECT * FROM tab1 WHERE c2 IN (SELECT c2 FROM tab2); c1 | c2 ---+---- 2 | B (1 row) dbadmin=> SELECT * FROM tab1 WHERE c2 NOT IN (SELECT c2 FROM tab2); c1 | c2 ---+---- 1 | A (1 row) Those are the expected results. But what happens if there is a NULL value in the table TAB2? dbadmin=> INSERT INTO tab2 SELECT NULL; OUTPUT -------- 1 (1 row) dbadmin=> SELECT * FROM tab2; c2 ---- B (2 rows) dbadmin=> SELECT * FROM tab1 WHERE c2 IN (SELECT c2 FROM tab2); c1 | c2 ---+---- 2 | B (1 row) dbadmin=> SELECT * FROM tab1 WHERE c2 NOT IN (SELECT c2 FROM tab2); c1 | c2 ---+---- (0 rows) We got an empty result set for the SQL statement with the NOT IN clause!

If you would prefer to ignore NULLS in this situation, shown below are several methods to do so. dbadmin=> SELECT * FROM tab1 WHERE c2 NOT IN (SELECT NVL(c2, '') FROM tab2); c1 | c2 ---+---- 1 | A (1 row) dbadmin=> SELECT * FROM tab1 WHERE c2 <> ANY(SELECT c2 FROM tab2); c1 | c2 ---+---- 1 | A (1 row) dbadmin=> SELECT * FROM tab1 WHERE NOT EXISTS (SELECT NULL FROM tab2 WHERE tab2.c2 = tab1.c2); c1 | c2 ---+---- 1 | A (1 row) Helpful Link:

https://www.vertica.com/docs/latest/HTML/Content/Authoring/AnalyzingData/Queries/Subqueries/Subqueries.htm

Have fun!