Vertica

Data Analytics Glossary

Download Whitepaper

Who's Winning With Vertica?

Watch the videos below to hear some of the success stories from the game changing community of Vertica users.





    • Big Data: Data sets where volume and complexity is beyond the capability of typical database tools to collect, store, manage and analyze.

 

    • Structured Data: Structured data refers to data that is identifiable because it is organized in a structure. The most common form of structured data — or structured data records (SDR) — is a database where specific information is stored based on a methodology of columns and rows. Structured data is also searchable by data type within content. Structured data is understood by computers and is also efficiently organized for human readers.

 

    • Unstructured or Semi-Structured Data: Refers to any data that has no identifiable structure. For example, images, videos, email, documents and text are all considered to be unstructured data within a data set. While each individual document may contain its own specific structure or formatting that is based on the software program used to create the data, unstructured data may also be considered “semi-structured data” because the data sources do have a structure but all data within a data set will not contain the same structure.

 

    • Analytics: Analytics is the discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight. The most common application of analytics is to use business data to describe, predict, and improve business performance.

 

    • EDW: An Enterprise Data Warehouse (DW or DWH) is a database used for reporting and data analysis. It is a central repository of data that is created by integrating data from multiple disparate sources. Data warehouses store current as well as historical data and are commonly used for creating trending reports for senior management reporting such as annual and quarterly comparisons. The data stored in the warehouse are uploaded from the operational systems.

 

    • Business Intelligence: Business intelligence (BI) refers to computer-based techniques used in spotting, digging-out, and analyzing business data, such as sales revenue by products and/or departments or associated costs and incomes. Business Intelligence relies on data analytic tools and data mining solutions.

 

    • MPP: Massively Parallel Processing is the “shared nothing” approach of parallel computing. This allows unlimited linear scaling and does not rely upon a specialized “leader” or “header” node which can create bottlenecks or single points of failure. The HP Vertica Analytics Platform’s MPP architecture ensures scalability and maximum performance at the lowest possible cost due by leveraging commodity x86 hardware.

 

    • Column Store: Most legacy relational databases are row based. Native Column Store databases like HP Vertica Analytics Platform eliminates disk I/O bottlenecks and allows load and query simultaneously. This accelerates performance exponentially and is a key technology advantage.

 

    • Hadoop: Apache Hadoop is an open source distributed file system that can serve as an effective storage ground for large amounts of data. Not a database, Hadoop is very efficient at loading any type of data, structured and unstructured, but it is not very efficient at pulling out that data. Hadoop is effective for batch processing where real time/interactive analytics are not required. Specialized programming skills are necessary to leverage Hadoop, such as MapReduce.

 

    • R: R is an open source programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software and data analysis.

 

    • SQL: SQL, “sequel,” or Structured Query Language is a special-purpose programming language designed for managing data in relational database management systems (RDBMS). Analysts use SQL extensively in their daily work and want to ensure that any data analytics platform they rely upon can support SQL.

 

  • ODS: An operational data store (ODS) is a type of database that’s often used as an interim logical area for a data warehouse. While in the ODS, data can be scrubbed, resolved for redundancy, and checked for compliance with the corresponding business rules. An ODS can be used for integrating disparate data from multiple sources so that business operations, analysis, and reporting can be carried out while business operations are occurring. An ODS is designed for relatively simple queries on small amounts of data (such as finding the status of a customer order), rather than the complex queries on large amounts of data typical in a data warehouse.