Plotting data with vsql and gnuplot

Posted December 6, 2018 by Phil Molea, Sr. Information Developer, Vertica

Maurizio Felici authored this post. We all know how plotting data is a very useful – and widely adopted – technique. Linux (and MacOS) offer a wonderful command line tool to plot data called gnuplot. You can install and run gnuplot on one of your Vertica nodes so you don’t have to move data out of the cluster; just copy the resulting graph. Here are a few examples of what you can do with vsql and gnuplot. Example 1: Analyzing Data Distribution Here we use a little known gem, WIDTH_BUCKET, to analyze the distribution of the variable f in table public.norm and produce the data distribution histogram with gnuplot: vsql -AXtq -F ' ' -f - <<-EOF | SELECT WIDTH_BUCKET(f, 0, 1, 100), COUNT(*) FROM public.norm GROUP BY 1 ORDER BY 1 ; EOF gnuplot -e "set term png ; plot '-' using 1:2 with boxes" > distr.png And here is the result: Example 2: Analyzing Vertica Catalog Size by node and time First we extract the data needed for post-processing to separate data into different blocks (one per node): vsql -AXtq -U dbadmin -o /tmp/catsize.dat -f - <<-EOF SELECT CASE WHEN LAG(node,1,'') OVER(ORDER BY node, Hour) = '' THEN node || CHR(10) WHEN LAG(node,1,'') OVER(ORDER BY node, Hour) <> node THEN CHR(10) || CHR(10) || node || CHR(10) ELSE '' END ||Hour::VARCHAR(32) , catalog_size_mb FROM ( SELECT REGEXP_SUBSTR(node_name,'[^_]*$') AS node, ROUND(time, 'HH') AS Hour, SUM((total_memory_max_value - free_memory_min_value) )/1024/1024 AS Catalog_Size_MB FROM dc_allocation_pool_statistics_by_hour GROUP BY 1, 2 ) input; EOF And then we use gnuplot to produce the graph: gnuplot <<-EOF set term png set datafile separator "|" set xdata time set timefmt "%Y-%m-%d %H:%M:%S" set format x "%m-%d\n%H:%M" set output "catsize_by_hour.png" nodes = 3 plot for [i=0:(nodes-1)] '/tmp/catsize.dat' index i u 1:2 w lines title columnhead(1) EOF The result is similar to the following: