Vertica differs from standard RDBMS in the way that it stores data. By grouping data together on disk by column rather than by row, Vertica reads just the columns referenced by the query, instead of scanning the whole table as row-oriented databases must do. This speeds up query processing dramatically by reducing disk I/O.
Think all Column Store Databases are the same? Read this Whitepaper to learn about twelve critical capabilities that give a native column-store database superior performance and massive scale over legacy technologies.
By grouping data together on disk by column, Vertica creates the perfect scenario for data compression—lots of similar or repetitive values can be compressed very aggressively. Vertica features a library of many compression algorithms, which it applies automatically based on data type. Typically, the data in Vertica occupies up to 90% less disk space than the data loaded into it. This not only lowers storage costs, but also speeds up querying by further reducing disk I/O.
Vertica’s architecture is a “shared-nothing,” distributed database designed to work on almost any platform, including clusters of inexpensive, off-the-shelf servers, Amazon and Azure Cloud servers, and Hadoop. Its performance can not only be tuned with features like resource pools and projections, but it can be scaled simply by adding new servers to the cluster. Clustering speeds up performance by parallelizing querying and loading across the nodes in the cluster for higher throughput.
These architectural differences—column storage, compression, MPP Scale-Out architecture and the ability to distribute a query are what fundamentally enable analytic applications based on Vertica to scale seamlessly and offer many more users access to much more data.