This blog post was authored by Mitchell Tracy and Monica Cellio.
Hadoop clusters can use authorization services to determine which users can access what data in Hive and, by extension, HDFS. In Vertica 9.0 we now support one of the most common such services, Apache Sentry.
Apache Sentry is a project in the Hadoop ecosystem which provides centralized role-based access control (RBAC) to a Hadoop cluster. After installing the Sentry service in your Hadoop cluster, you can create roles, grant and revoke privileges for those roles, and then apply the roles to groups of users. This allows for quick, central administration of privileges to your Hive data. Now Vertica can take advantage of this service to govern database users’ access to Hive data.
In addition, Sentry provides the ability to synchronize HDFS Access Control Lists and Sentry permissions. The Hadoop administrator doesn’t have to separately set ACLs for the HDFS data that Hive uses; HDFS enforces Sentry’s permissions. This means that if a user has permissions to read all of the data in a table, that user will also be able to read the files which hold that data.
This ability to read the underlying files is important for Vertica’s integration with Sentry. If Sentry is enabled in your Hadoop cluster, and you have synchronized HDFS ACLs and Sentry permissions, then you don’t need to do anything else to set up permissions in Vertica. Users who query data as external tables in HDFS or by using the Vertica HCatalog Connector will have their permissions checked against Sentry prior to data access.
To learn more about using the HCatalog Connector with Sentry, see Configuring Kerberos for the HCatalog Connector
. For more information about using Sentry, see The Sentry Service
in the Cloudera documentation.