Using Kerberos with Vertica
If you use Kerberos for your Vertica cluster and your principals have access to HDFS, then you can configure Vertica to use the same credentials for HDFS.
Vertica authenticates with Hadoop in two ways that require different configurations:
- User Authentication—On behalf of the user, by passing along the user's existing Kerberos credentials. This method is also called user impersonation. Actions performed on behalf of particular users, like executing queries, generally use user authentication.
- Vertica Authentication—On behalf of system processes that access ROS data or the catalog, by using a special Kerberos credential stored in a keytab file.
Vertica and Hadoop must use the same Kerberos server or servers (KDCs).
Vertica can interact with more than one Kerberos realm. To configure multiple realms, see Multi-realm Support.
Vertica attempts to automatically refresh Hadoop tokens before they expire. See Token Expiration.
To use Vertica with Kerberos and Hadoop, the client user first authenticates with one of the Kerberos servers (Key Distribution Center, or KDC) being used by the Hadoop cluster. A user might run
kinit or sign in to Active Directory, for example.
A user who authenticates to a Kerberos server receives a Kerberos ticket. At the beginning of a client session, Vertica automatically retrieves this ticket. Vertica then uses this ticket to get a Hadoop token, which Hadoop uses to grant access. Vertica uses this token to access HDFS, such as when executing a query on behalf of the user. When the token expires, Vertica automatically renews it, also renewing the Kerberos ticket if necessary.
The user must have been granted permission to access the relevant files in HDFS. This permission is checked the first time Vertica reads HDFS data.
Vertica can use multiple KDCs serving multiple Kerberos realms, if proper cross-realm trust has been set up between realms.
Automatic processes, such as the Tuple Mover or the processes that access Eon Mode communal storage, do not log in the way users do. Instead, Vertica uses a special identity (principal) stored in a keytab file on every database node. (This approach is also used for Vertica clusters that use Kerberos but do not use Hadoop.) After you configure the keytab file, Vertica uses the principal residing there to automatically obtain and maintain a Kerberos ticket, much as in the client scenario. In this case, the client does not interact with Kerberos.
Each Vertica node uses its own principal; it is common to incorporate the name of the node into the principal name. You can either create one keytab per node, containing only that node's principal, or you can create a single keytab containing all the principals and distribute the file to all nodes. Either way, the node uses its principal to get a Kerberos ticket and then uses that ticket to get a Hadoop token.
When creating HDFS storage locations Vertica uses the principal in the keytab file, not the principal of the user issuing the CREATE LOCATION statement. The HCatalog Connector sometimes uses the principal in the keytab file, depending on how Hive authenticates users.
Configuring Users and the Keytab File
If you have not already configured Kerberos authentication for Vertica, follow the instructions in Configure Vertica for Kerberos Authentication. Of particular importance for Hadoop integration:
- Create one Kerberos principal per node.
- Place the keytab files in the same location on each database node and set configuration parameter KerberosKeytabFile to that location.
- Set KerberosServiceName to the name of the principal. (See Inform Vertica About the Kerberos Principal.)
If you are using the HCatalog Connector, follow the additional steps in Configuring Security in the HCatalog Connector documentation.
If you are using HDFS storage locations, give all node principals read and write permission to the HDFS directory you will use as a storage location.