How Vertica uses Kerberos With Hadoop
Vertica authenticates with Hadoop in two ways that require different configurations:
- User Authentication—On behalf of the user, by passing along the user's existing Kerberos credentials, as occurs with the HDFS Connector and the HCatalog Connector.
- Vertica Authentication—On behalf of system processes (such as the Tuple Mover), by using a special Kerberos credential stored in a keytab file.
Note: Vertica and Hadoop must use the same Kerberos server (KDC).
User Authentication
To use Vertica with Kerberos and Hadoop, the client user first authenticates with the Kerberos server (Key Distribution Center, or KDC) being used by the Hadoop cluster. A user might run kinit or sign in to Active Directory, for example.
A user who authenticates to a Kerberos server receives a Kerberos ticket. At the beginning of a client session, Vertica automatically retrieves this ticket.The database then uses this ticket to get a Hadoop token, which Hadoop uses to grant access. Vertica uses this token to access HDFS, such as when executing a query on behalf of the user. When the token expires, the database automatically renews it, also renewing the Kerberos ticket if necessary.
The user must have been granted permission to access the relevant files in HDFS. This permission is checked the first time Vertica reads HDFS data.
The following figure shows how the user, Vertica, Hadoop, and Kerberos interact in user authentication:
When using the HDFS Connector or the HCatalog Connector, or when reading an ORC or Parquet file stored in HDFS, Vertica uses the client identity as the preceding figure shows.
Vertica Authentication
Automatic processes, such as the Tuple Mover, do not log in the way users do. Instead, Vertica uses a special identity (principal) stored in a keytab file on every database node. (This approach is also used for Vertica clusters that use Kerberos but do not use Hadoop.) After you configure the keytab file, Vertica uses the principal residing there to automatically obtain and maintain a Kerberos ticket, much as in the client scenario. In this case, the client does not interact with Kerberos.
The following figure shows the interactions required for Vertica authentication:
Each Vertica node uses its own principal; it is common to incorporate the name of the node into the principal name. You can either create one keytab per node, containing only that node's principal, or you can create a single keytab containing all the principals and distribute the file to all nodes. Either way, the node uses its principal to get a Kerberos ticket and then uses that ticket to get a Hadoop token. For simplicity, the preceding figure shows the full set of interactions for only one database node.
When creating HDFS storage locations Vertica uses the principal in the keytab file, not the principal of the user issuing the CREATE LOCATION statement.
See Also
For specific configuration instructions, see Configuring Kerberos.