LIMIT Queries with ROW_NUMBER Predicates

Queries that use the LIMIT clause with ORDER BY or analytic function ROW_NUMBER() return a specific subset of rows in the query result. Vertica processes these queries efficiently using Top-K Optimization, which is a database query ranking process. Top-K optimization avoids sorting (and potentially writing to disk) an entire data set to find a small number of rows. This can significantly improve query performance.

For example, in the following query, Vertica extracts only the three smallest rows from column x:

=> SELECT * FROM t1 ORDER BY x LIMIT 3;

If table t1 contains millions of rows, it is time consuming to sort all the x values. Instead, Vertica keeps track of the smallest three values in x.

Note: If you omit the ORDER BY clause, when using the LIMIT clause, the results can be nondeterministic.

Sort operations that precede a SQL analytics computation benefit from Top-K optimization if the query contains an OVER(ORDER BY) clause and a predicate on the ROW_NUMBER function, as in the following example:

=> SELECT x FROM
     (SELECT *, ROW_NUMBER() OVER (ORDER BY x) AS row 
     FROM t1) t2 WHERE row <= 3;

The preceding query has the same behavior as the following query, which uses a LIMIT clause:

=> SELECT ROW_NUMBER() OVER (ORDER BY x) AS RANK FROM t1 LIMIT 3;

You can use ROW_NUMBER() with the analytic window partition clause, something you cannot do if you use LIMIT:

=> SELECT x, y FROM
     (SELECT *, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
     AS row FROM t1) t2 WHERE row <= 3;

Note: When the OVER() clause includes the window-partition-clause, Top-K optimization occurs only when the analytic sort matches the input's sort, for example, if the projection is sorted on columns x and y in table t1.

If you still want to improve the performance of your query, consider using the optimization techniques described in ORDER BY Queries.