Big Flat Fact Tables

This blog post was authored by Steve Sarsfield.

For decades, it’s been widely accepted that snowflake and star schemas facilitate getting optimal performance from your data warehouse. You normalize data by identifying the rows of data that you typically ingest, and creating a schema that is optimized for the types of queries you want to run. The schema sets up logical tables that group and sometimes duplicate the data so queries run fast.

Taking this approach can help avoid performance problems in a standard RDBMS, which must scan through many rows of data to find data required for a given query. Breaking up data into separate tables helps address scalability limits of a row-store database. Thus, schemas for normalized data typically comprise multiple large fact tables and many smaller dimension tables. Analytical queries often involve joins between a large fact table and multiple dimension tables.

Schemas are designed both for the data you ingest, and the analysis you run on that data. Changes in either might require you to refactor your schemas. Refactoring is tiresome and can be expensive. Much has been written about this from Ralph Kimball, Bill Inmon and many others. There are many ways to refactor, all with potential benefit and peril. When you refactor, you run the risk of slowing down other queries that rely on the current schema, or breaking the queries altogether. Thus, analytics teams tend to avoid it, even at the expense of query performance.

In contrast, a column store database such as Vertica has few limitations on the number of columns each table can have, as table width has much less impact on performance. A column store does not need to scan across millions of rows to find the data it wants – it can access it directly as a column. A column store database lets you create and use schemas that are optimized for a particular set of data and a particular type of question. You can also create projections on the fly to use the data for another purpose.

Given this, some Vertica users create wide tables that combine the fact and dimension table columns that their queries require. These tables can dramatically speed up query execution. However, maintaining redundant sets of normalized and denormalized data carries its own administrative costs.

With release 8.1, Vertica introduced flattened tables, which minimizes these problems. Flattened tables can include columns that get their values by querying other tables. Operations on the source tables and flattened table are decoupled; changes in one are not automatically propagated to the other. This minimizes the overhead that is otherwise typical of denormalized tables.

Flattened tables offer the following benefits:
• Performance: Queries run faster on a flattened table than a standard or optimized join.
• Simplicity: The absence of complex Joins simplifies queries for analysts.
• Ease: Creating flattened tables requires no refactoring.
• Storage and price: Vertica flattened tables require only minimal additional disk space, so they have no significant impact on license costs.

If your schemas show signs of needing to be refactored, check out flattened tables. For more information on flattened tables, check out our online documentation and other resources.

About the Author

Soniya Shah
Information Developer

Currently, a first year law student with a background in science and technology. Experienced technical writer, with specializations in software documentation, big data, blog development, and website development. I build user-centered content to communicate complex and technical information more easily.

I used to work for Vertica full time for about 3 years. I still work at Vertica part time while going to law school.

Update: Soniya is now doing her law internship, and no longer working at Vertica. Good luck, Soniya!

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Big Flat Fact Tables

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Big Flat Fact Tables

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Vertica Blog Posts