Table Row Counts by Schema

Posted June 13, 2019 by Jim Knicely, Big Data Solutions Architect, Vertica

Counting
I had a colleague ask today how she could generate a report showing the total table row counts by schema. This should be easy, right? After all, the PROJECTION_STORAGE system table has a ROW_COUNT column. But it turns out it’s a little complex considering there are segmented and unsegmented projections and delete vectors in Vertica.

So we’ll need to join the TABLES, PROJECTION_STORAGE, and DELETE_VECTORS system tables together and handle the row counts a little differently by projection type. (Note that we need the TABLES system table because you can have a table with no projections.)

Example: dbadmin=> SELECT table_schema, dbadmin-> SUM(row_count) AS row_count dbadmin-> FROM (SELECT table_schema, dbadmin(> NVL(CASE dbadmin(> WHEN NOT is_segmented THEN (row_count / node_cnt)::INT dbadmin(> ELSE row_count dbadmin(> END, 0) row_count dbadmin(> FROM (SELECT t.table_schema, dbadmin(> t.table_name, dbadmin(> ps.projection_name, dbadmin(> (SELECT MIN(is_segmented) dbadmin(> FROM projections p dbadmin(> WHERE p.projection_name = ps.projection_name dbadmin(> AND p.projection_schema = ps.anchor_table_schema) is_segmented, dbadmin(> node_cnt, dbadmin(> SUM(ps.row_count - NVL(dv.deleted_row_count, 0)) row_count dbadmin(> FROM tables t dbadmin(> LEFT JOIN projection_storage ps dbadmin(> ON t.table_id = ps.anchor_table_id dbadmin(> LEFT JOIN delete_vectors dv dbadmin(> ON dv.node_name = ps.node_name dbadmin(> AND dv.schema_name = ps.anchor_table_schema dbadmin(> AND dv.projection_name = ps.projection_name dbadmin(> CROSS JOIN (SELECT COUNT(1) node_cnt FROM nodes WHERE node_type = 'PERMANENT') n dbadmin(> GROUP BY 1, 2, 3, 4, 5 dbadmin(> LIMIT 1 OVER(PARTITION BY t.table_schema, t.table_name ORDER BY 1)) foo) foo2 dbadmin-> GROUP BY table_schema dbadmin-> ORDER BY table_schema; table_schema | row_count ---------------+----------- ctg_analytics | 0 dds | 39759 idea_dwh | 12288 jim1 | 16 public | 71007066 stlouis | 77396606 test | 8 (7 rows) But I’m not comfortable with this solution. What happens when Vertica inevitably introduces improved projection types? We may have to modify the query to handle those.

I prefer to run actual row counts from the tables. To do that, we can generate a SQL statement that literally runs a SELECT COUNT(1) from each table in the database, summing them by schema: dbadmin=> \! vsql -Atc "SELECT 'SELECT table_schema, SUM(table_row_count) total_row_count FROM ('UNION ALL SELECT 'SELECT ''' || table_schema || ''' AS table_schema, (SELECT COUNT(1) FROM ' || table_schema || '.' || table_name || ') AS table_row_count' || DECODE(LEAD(table_name) OVER (ORDER BY 1), NULL, ') foo GROUP BY table_schema ORDER BY table_schema;', ' UNION ALL') FROM tables WHERE table_definition = '';" | vsql table_schema | total_row_count ---------------+----------------- ctg_analytics | 0 dds | 39759 idea_dwh | 12288 jim1 | 16 public | 71007066 stlouis | 77396606 test | 8 (7 rows) This works because Vertica is pretty darn quick at performing table row counts!

Helpful Links:

https://www.vertica.com/docs/latest/HTML/Content/Authoring/SQLReferenceManual/SystemTables/CATALOG/TABLES.htm https://www.vertica.com/docs/latest/HTML/Content/Authoring/SQLReferenceManual/SystemTables/MONITOR/PROJECTION_STORAGE.htm https://www.vertica.com/docs/latest/HTML/Content/Authoring/SQLReferenceManual/SystemTables/MONITOR/DELETE_VECTORS.htm https://www.vertica.com/docs/latest/HTML/Content/Authoring/SQLReferenceManual/SystemTables/CATALOG/NODES.htm

Have fun!