SYNC_WITH_HCATALOG_SCHEMA
Copies the structure of a Hive database schema available through the HCatalog Connector to a Vertica schema.
Syntax
SYNC_WITH_HCATALOG_SCHEMA( local_schema, hcatalog_schema, [drop_tables] )
Parameters
local_schema |
The existing Vertica schema to store the copied HCatalog schema's metadata. This can be the same schema as hcatalog_schema, or it can be a separate one created using CREATE SCHEMA. |
hcatalog_schema |
The HCatalog schema to copy. This is the schema created using CREATE HCATALOG SCHEMA. |
[drop_tables] | If true , drop any tables in local_schema that do not correspond to a table in hcatalog_schema |
Notes
The SYNC_WITH_HCATALOG_SCHEMA function overwrites tables in the Vertica schema whose names match a table in the HCatalog schema. Do not use the Vertica schema to store other data.
Hive STRING and BINARY data types are matched, in Vertica, to VARCHAR(65000) and VARBINARY(65000) types. You can use ALTER TABLE to adjust these after creating the schema. The maximum size of a VARCHAR or VARBINARY in Vertica is 65000, but you can use LONG VARCHAR and LONG VARBINARY to specify larger values.
Hive and Vertica define string length in different ways. In Hive the length is the number of characters; in Vertica it is the number of bytes. Thus, a character encoding that uses more than one byte, such as Unicode, can result in mismatches between the two. Set values in Vertica based on bytes, not characters, to avoid data truncation.
If the data size exceeds the declared size for the column, Vertica logs an event at read time in the QUERY_EVENTS system table.
This function can synchronize the HCatalog schema directly, in which case you call it with the same schema name for both parameters. It can also synchronize a different schema to the HCatalog schema.
If you change the settings of any HCatalog Connector configuration parameters (Apache Hadoop Parameters), you must call this function again.
Privileges
The user must have CREATE privileges on local_schema.
The user also requires access to Hive data in one of the following ways:
- Have USAGE permissions on hcatalog_schema, if Hive does not use an authorization service to manage access.
- Have permission through an authorization service (Sentry or Ranger), and have access to the underlying files in HDFS. (Sentry can provide that access through ACL synchronization.)
- Be the dbadmin user, with or without an authorization service.
Examples
The following example shows using SYNC_WITH_HCATALOG_SCHEMA to synchronize an HCatalog schema named hcat:
=> CREATE HCATALOG SCHEMA hcat WITH hostname='hcathost' HCATALOG_SCHEMA='default' HCATALOG_USER='hcatuser'; CREATE SCHEMA => SELECT sync_with_hcatalog_schema('hcat', 'hcat'); sync_with_hcatalog_schema ---------------------------------------- Schema hcat synchronized with hcat tables in hcat = 56 tables altered in hcat = 0 tables created in hcat = 56 stale tables in hcat = 0 table changes erred in hcat = 0 (1 row) => -- Use vsql's \d command to describe a table in the synced schema => \d hcat.messages List of Fields by Tables Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key -----------+----------+---------+----------------+-------+---------+----------+-------------+------------- hcat | messages | id | int | 8 | | f | f | hcat | messages | userid | varchar(65000) | 65000 | | f | f | hcat | messages | "time" | varchar(65000) | 65000 | | f | f | hcat | messages | message | varchar(65000) | 65000 | | f | f | (4 rows)
The following example shows using SYNC_WITH_HCATALOG_SCHEMA followed by ALTER TABLE to adjust a column value:
=> CREATE HCATALOG SCHEMA hcat WITH hostname='hcathost' HCATALOG_SCHEMA='default' -> HCATALOG_USER='hcatuser'; CREATE SCHEMA => SELECT sync_with_hcatalog_schema('hcat', 'hcat'); ... => ALTER TABLE hcat.t ALTER COLUMN a1 SET DATA TYPE long varchar(1000000); => ALTER TABLE hcat.t ALTER COLUMN a2 SET DATA TYPE long varbinary(1000000);
The following example shows using SYNC_WITH_HCATALOG_SCHEMA with a local (non-HCatalog) schema:
=> CREATE HCATALOG SCHEMA hcat WITH hostname='hcathost' HCATALOG_SCHEMA='default' -> HCATALOG_USER='hcatuser'; CREATE SCHEMA => CREATE SCHEMA hcat_local; CREATE SCHEMA => SELECT sync_with_hcatalog_schema('hcat_local', 'hcat');