LIMIT Queries with ROW_NUMBER Predicates
Queries that use the LIMIT clause with ORDER BY
or analytic function ROW_NUMBER()
return a specific subset of rows in the query result. Vertica processes these queries efficiently using Top-K Optimization, which is a database query ranking process. Top-K optimization avoids sorting (and potentially writing to disk) an entire data set to find a small number of rows. This can significantly improve query performance.
For example, in the following query, Vertica extracts only the three smallest rows from column x
:
=> SELECT * FROM t1 ORDER BY x LIMIT 3;
If table t1
contains millions of rows, it is time consuming to sort all the x
values. Instead, Vertica keeps track of the smallest three values in x
.
If you omit the ORDER BY
clause, when using the LIMIT clause, the results can be nondeterministic.
Sort operations that precede a SQL analytics computation benefit from Top-K optimization if the query contains an OVER(ORDER BY)
clause and a predicate on the ROW_NUMBER
function, as in the following example:
=> SELECT x FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY x) AS row FROM t1) t2 WHERE row <= 3;
The preceding query has the same behavior as the following query, which uses a LIMIT
clause:
=> SELECT ROW_NUMBER() OVER (ORDER BY x) AS RANK FROM t1 LIMIT 3;
You can use ROW_NUMBER()
with the analytic window partition clause, something you cannot do if you use LIMIT
:
=> SELECT x, y FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) AS row FROM t1) t2 WHERE row <= 3;
When the OVER()
clause includes the window-partition-clause, Top-K optimization occurs only when the analytic sort matches the input's sort, for example, if the projection is sorted on columns x
and y
in table t1
.
If you still want to improve the performance of your query, consider using the optimization techniques described in ORDER BY Queries.