Ntile, and why you should love it

white cloud in vault type room representing cloud computing

This blog post was authored by Curtis Bennett. According to a quick internet search, the hottest recorded temperature in Los Angeles was 109 degrees Fahrenheit, recorded on July 8th, 2017. If you had access to all kinds of weather data (spoiler alert: you do!) you could query the data and figure out the highs and lows for any given weather reporting station, and you could even trend the weather over time, in case that was interesting to you. The lowest temperature, incidentally, was 28 degrees, recorded on January 4th, 1949. If you were to run a min() and a max() on the weather in Los Angeles, you would see something like this:

min  |  max
-----+-----
28 | 109
(1 row)

If you were going to take a trip to Los Angeles, you wouldn’t have any idea what to pack, based on those readings. According to that data, Los Angeles has extreme weather! There’s a better way – ntile. Ntile allows you to break down the data set into n groups, with each group showing the number of occurrences for that particular group. Basically, ntile explodes the result set so you can more easily find patterns in your data. For example, in Vertica, if you’re loading small data files, it’s best to use WOS. If you’re loading large files, putting the data directly into ROS with the DIRECT keyword is preferred. How well do you do in your environment following this practice? Loading large files into WOS (without the DIRECT keyword) will almost certainly result in a WOS spill, which is inefficient, and creates many ROS containers. Likewise, loading many very small files with the DIRECT keyword is slower, and can also create many ROS containers. The following query can show you the load behavior:

=>select avg(processed_row_count), case when query ilike '%direct%' then 'direct' else 'WOS' end as type
from query_profiles
where table_name in (select table_name
from query_profiles
where query_type = 'LOAD')
and query ilike 'copy %'
group by 2 ;

avg | type ——————+——– 613460.058823529 | WOS 1637375008.9375 | direct (2 rows) These results look OK, but they are inconclusive. These are just the average load sizes for all loads of each type. It’s hard to make any determination based on this data. Fortunately, there’s an easier way – ntile.

=>select ntile, type, to_char(avg(processed_row_count),'999,999,999') as avg_row_count
from (select processed_row_count
, ntile(10) over (order by processed_row_count) as ntile
, case when query ilike '%direct%' then 'direct'
else 'WOS' end as type
from query_profiles
where table_name in (select table_name
from query_profiles
where query_type = 'LOAD')
and query ilike 'copy %') sq
group by ntile, type order by ntile;

ntile | type | avg_row_count ——-+——–+—————— 1 | WOS | 1 1 | direct | 0 2 | WOS | 1 3 | WOS | 1 4 | WOS | 1 5 | WOS | 1 6 | WOS | 1 7 | WOS | 1 8 | WOS | 1 9 | WOS | 15 10 | WOS | 13,557,415 10 | direct | 2,015,230,780 (12 rows) From this output, we have a much better picture of loads in this environment. It appears as though the developer occasionally forgets to put the DIRECT keyword on large loads. (Don’t feel bad – it’s very common!), but generally does a very good job loading only very small data into WOS. If you like ntile, be sure to also check out ntile’s more pedantic evil twin, width_bucket ,which allows for many more configuration options.

About the Author

Soniya Shah
Information Developer

Currently, a first year law student with a background in science and technology. Experienced technical writer, with specializations in software documentation, big data, blog development, and website development. I build user-centered content to communicate complex and technical information more easily.

I used to work for Vertica full time for about 3 years. I still work at Vertica part time while going to law school.

Update: Soniya is now doing her law internship, and no longer working at Vertica. Good luck, Soniya!

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Ntile, and why you should love it

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Ntile, and why you should love it

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Under the Hood Posts