CREATE HCATALOG SCHEMA

Define a schema for data stored in a Hive data warehouse using the HCatalog Connector. For more information, see Using the HCatalog Connector in Integrating with Apache Hadoop.

Most of the optional parameters are read out of Hadoop configuration files if available. See the Notes section for further information.

Syntax

CREATE HCATALOG SCHEMA [IF NOT EXISTS] schemaName
    [AUTHORIZATION user-id] 
    [WITH
	[HOSTNAME='metastore-host']
	[PORT='metastore-port']
	[HIVESERVER2_HOSTNAME='hivserver-host']
	[WEBSERVICE_HOSTNAME='webHCat-host']
	[WEBSERVICE_PORT='webHCat-port']
	[WEBHDFS_ADDRESS='webhdfs-address']
	[HCATALOG_SCHEMA='hive-schema-name']
	[HCATALOG_USER='hcat-username']
	[HCATALOG_CONNECTION_TIMEOUT='timeout']
	[HCATALOG_SLOW_TRANSFER_LIMIT='xfer-limit']
	[HCATALOG_SLOW_TRANSFER_TIME='xfer-time'] ]

Parameters

Parameter Description Default Value
[IF NOT EXISTS]

If given, the statement exits without an error when the schema named in schemaName already exists.

N/A
schemaName

The name of the schema to create in the Vertica catalog. The tables in the Hive database will be available through this schema.

none
AUTHORIZATION 'user-id' The name of a Vertica account to own the schema being created. This parameter is ignored if Kerberos authentication is being used; in that case the current vsql user is used. current username
HOSTNAME=
'metastore-host'
The hostname or IP address of the database server that stores the Hive data warehouse's metastore information. If this value is not specified, hive-site.xml must be available. none
PORT=
'metastore-port'
The port number on which the metastore database is running. See Notes
HIVESERVER2_HOSTNAME=
'hiveserver2-host'

The hostname or IP address of the HiveServer2 service. This parameter is optional if in hive-site.xml you set one of the following properties:

  • hive.server2.thrift.bind.host to a valid host
  • hive.server2.support.dynamic.service.discovery to true

This parameter is ignored if you are using WebHCat.

none
WEBSERVICE_HOSTNAME=
'webHCat-host'
The hostname or IP address of the WebHCat service, if using WebHCat instead of HiveServer2. If this value is not specified, webhcat-site.xml must be available. none
WEBSERVICE_PORT=
'webHCat-port'
The port number on which the WebHCat service is running, if using WebHCat instead of HiveServer2. If this value is not specified, webhcat-site.xml must be available. See Notes
WEBHDFS_ADDRESS=
'webhdfs-address'
The host and port ("host:port") for the WebHDFS service. This parameter is used only for reading ORC and Parquet files. If this value is not set, hdfs-site.xml must be available to read these file types through the HCatalog Connector. none
HCATALOG_SCHEMA=
'hive-schema-name'
The name of the Hive schema or database that the Vertica schema is being mapped to. schemaName
HCATALOG_USER=
'hcat-username'
The username of the HCatalog user to use when making calls to the HiveServer2 or WebHCat server. current username
HCATALOG_CONNECTION_TIMEOUT=
'timeout'
The number of seconds the HCatalog Connector waits for a successful connection to the HiveServer or WebHCat server. A value of 0 means wait indefinitely. See Notes
HCATALOG_SLOW_TRANSFER_LIMIT=
'xfer-limit'
The lowest data transfer rate (in bytes per second) from the HiveServer2 or WebHCat server that the HCatalog Connector accepts. See xfer-time for details. See Notes
HCATALOG_SLOW_TRANSFER_TIME=
'xfer-time'
The number of seconds the HCatalog Connector waits before enforcing the data transfer rate lower limit. After this time has passed, the HCatalog Connector tests whether the data transfer rate is at least as fast as the value set in xfer-limit. If it is not, then the HCatalog Connector breaks the connection and terminates the query. See Notes

Notes

The default values for timeout, xfer-limit, and xfer-time are set by the configuration parameters HCatConnectionTimeout, HCatSlowTransferLimit, and HCatSlowTransferTime. See Hadoop Parameters in the Administrator's Guide for more information.

If you are using Kerberos authentication, the current user name is always used and the AUTHORIZATION parameter is ignored.

When using HiveServer2 (the default), use HIVESERVER2_HOSTNAME to specify the server host. When using WebHCat, use WEBSERVICE_HOSTNAME to specify the server host.

If you copied the Hadoop configuration files as described in Configuring Vertica for HCatalog, you can omit most parameters. By default this statement uses the values specified in those configuration files, though you can still override them with this statement. If the configuration files are complete, the following is a valid statement:

=> CREATE HCATALOG SCHEMA hcat;

If a value is not specified in the configuration files and a default is shown in the parameter list, then that default value is used.

Privileges

The user must be a superuser or be granted all permissions on the database to use CREATE HCATALOG SCHEMA.

Example

The following example shows how to use CREATE HCATALOG SCHEMA to define a new schema for tables stored in a Hive database and then query the system tables that contain information about those tables:

=> CREATE HCATALOG SCHEMA hcat WITH HOSTNAME='hcathost' 
   HCATALOG_SCHEMA='default' HIVESERVER2_HOSTNAME='hs.example.com' 
   HCATALOG_USER='admin';
WARNING 0:  HOSTNAME will be ignored. hive-site.xml must be contain property [hive.metastore.uris], or both HOSTNAME and PORT must be specified
CREATE SCHEMA
=> \x
Expanded display is on.

=> SELECT * FROM v_catalog.hcatalog_schemata;
-[ RECORD 1 ]----------------+-------------------------------------------
schema_id                    | 45035996273748224
schema_name                  | hcat
schema_owner_id              | 45035996273704962
schema_owner                 | admin
create_time                  | 2017-05-22 12:04:59.692155-04
hostname                     | hcathost
port                         | -1
hiveserver2_hostname         | hs.example.com
webservice_hostname          |
webservice_port              | 50111
webhdfs_address              | hs.example.com:50070
hcatalog_schema_name         | default
hcatalog_user_name           | admin
hcatalog_connection_timeout  | -1
hcatalog_slow_transfer_limit | -1
hcatalog_slow_transfer_time  | -1

=> SELECT * FROM v_catalog.hcatalog_table_list;
-[ RECORD 1 ]------+------------------
table_schema_id    | 45035996273748224
table_schema       | hcat
hcatalog_schema    | default
table_name         | nation
hcatalog_user_name | admin
-[ RECORD 2 ]------+------------------
table_schema_id    | 45035996273748224
table_schema       | hcat
hcatalog_schema    | default
table_name         | raw
hcatalog_user_name | admin
-[ RECORD 3 ]------+------------------
table_schema_id    | 45035996273748224
table_schema       | hcat
hcatalog_schema    | default
table_name         | raw_rcfile
hcatalog_user_name | admin
-[ RECORD 4 ]------+------------------
table_schema_id    | 45035996273748224
table_schema       | hcat
hcatalog_schema    | default
table_name         | raw_sequence
hcatalog_user_name | admin