Detecting Constraint Violations

ANALYZE_CONSTRAINTS analyzes and reports on table constraint violations within a given schema. You can use ANALYZE_CONSTRAINTS to analyze an individual table, specific columns within a table, or all tables within a schema. You typically use this function on tables where primary key, unique, or check constraints are not enforced. You can also use ANALYZE_CONSTRAINTS to check the referential integrity of foreign keys.

In the simplest use case, ANALYZE_CONSTRAINTS is a two-step process:

  1. Run ANALYZE_CONSTRAINTS on the desired table. ANALYZE_CONSTRAINTS reports all rows that violate constraints.
  2. Use the report to fix violations.

You can also use ANALYZE_CONSTRAINTS in the following cases:

  • Analyze tables with enforced constraints.
  • Detect constraint violations introduced by a COPY operation and address them before the copy transaction is committed.

Analyzing Tables with Enforced Constraints

If constraints are enforced on a table and a DML operation returns constraint violations, Vertica reports on a limited number of constraint violations before it rolls back the operation. This can be problematic when you try to load a large amount of data that includes many constraint violations—for example, duplicate key values. In this case, use ANALYZE_CONSTRAINTS as follows:

  1. Temporarily disable enforcement of all constraints on the target table.
  2. Run the DML operation.
  3. After the operation returns, run ANALYZE_CONSTRAINTS on the table. ANALYZE_CONSTRAINTS reports all rows that violate constraints.
  4. Use the report to fix the violations.
  5. Re-enable constraint enforcement on the table.

Using ANALYZE_CONSTRAINTS in a COPY Transaction

Use ANALYZE_CONSTRAINTS to detect and address constraint violations introduced by a COPY operation as follows:

  1. Copy the source data into the target table with COPY…NO COMMIT.
  2. Call ANALYZE_CONSTRAINTS to check the target table with its uncommitted updates.
  3. If ANALYZE_CONSTRAINTS reports constraint violations, roll back the copy transaction.
  4. Use the report to fix the violations, and then re-execute the copy operation.

For details about using COPY…NO COMMIT, see Overriding COPY Auto Commit.

Distributing Constraint Analysis

ANALYZE_CONSTRAINTS runs as an atomic operation—that is, it does not return until it evaluates all constraints within the specified scope. For example, if you run ANALYZE_CONSTRAINTS against a table, the function returns only after it evaluates all column constraints against column data. If the table has a large number of columns with constraints, and contains a very large data set, ANALYZE_CONSTRAINTS is liable to exhaust all available memory and return with an out-of-memory error. This risk is increased by running ANALYZE_CONSTRAINTS against multiple tables simultaneously, or against the entire database.

You can minimize the risk of out-of-memory errors by setting configuration parameter MaxConstraintChecksPerQuery (by default set to ‑1) to a positive integer. For example, if this parameter is set to 20, and you run ANALYZE_CONSTRAINTS on a table that contains 38 column constraints, the function divides its work into two separate queries. ANALYZE_CONSTRAINTS creates a temporary table for loading and compiling results from the two queries, and then returns the composite result set.

MaxConstraintChecksPerQuery can only be set at the database level, and can incur a certain amount of overhead. When set, commits to the temporary table created by ANALYZE_CONSTRAINTS cause all pending database transactions to auto-commit. Setting this parameter to a reasonable number such as 20 should minimize its performance impact.