LIMIT Queries with ROW_NUMBER Predicates

Queries that use the LIMIT clause with ORDER BY or analytic function ROW_NUMBER() return a specific subset of rows in the query result. Vertica processes these queries efficiently using Top-K Optimization, which is a database query ranking process. Top-K optimization avoids sorting (and potentially writing to disk) an entire data set to find a small number of rows. This can significantly improve query performance.

For example, in the following query, Vertica extracts only the three smallest rows from column x:

=> SELECT * FROM t1 ORDER BY x LIMIT 3;

If table t1 contains millions of rows, it is time consuming to sort all the x values. Instead, Vertica keeps track of the smallest three values in x.

If you omit the ORDER BY clause, when using the LIMIT clause, the results can be nondeterministic.

Sort operations that precede a SQL analytics computation benefit from Top-K optimization if the query contains an OVER(ORDER BY) clause and a predicate on the ROW_NUMBER function, as in the following example:

=> SELECT x FROM
     (SELECT *, ROW_NUMBER() OVER (ORDER BY x) AS row 
     FROM t1) t2 WHERE row <= 3;

The preceding query has the same behavior as the following query, which uses a LIMIT clause:

=> SELECT ROW_NUMBER() OVER (ORDER BY x) AS RANK FROM t1 LIMIT 3;

You can use ROW_NUMBER() with the analytic window partition clause, something you cannot do if you use LIMIT:

=> SELECT x, y FROM
     (SELECT *, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
     AS row FROM t1) t2 WHERE row <= 3;

When the OVER() clause includes the window-partition-clause, Top-K optimization occurs only when the analytic sort matches the input's sort, for example, if the projection is sorted on columns x and y in table t1.

If you still want to improve the performance of your query, consider using the optimization techniques described in ORDER BY Queries.