HadoopImpersonationConfig Format

The value of the HadoopImpersonationConfig session parameter is a set of one or more JSON objects. Each object describes one doAs user or delegation token for one Hadoop destination.

Syntax

[ { ("doAs" | "token"): value, 
    ("nameservice" | "authority" | "schema"): value} [,...]
]

Properties

doAs

The name of a Hadoop user to impersonate.

token

A delegation token to use for HDFS access.

nameservice

A Hadoop nameservice. All access to this nameservice uses the doAs user or delegation token. You must use the hdfs URL scheme (for example, hdfs://ns-value/path).

authority

A namenode authority. All access to this authority uses the doAs user or delegation token. If the namenode fails over to another namenode, the doAs user or delegation token does not automatically apply to the failover namenode. If you are using HA namenode, use nameservice instead of authority or include objects for every namenode.

schema

A Hive schema, for use with the HCatalog Connector. Vertica uses this object's doAs user or token to access Hive metadata only. For data access you must also specify a nameservice or authority object, just like for all other data access.

Examples

In the following example of doAs, Bob is a Hadoop user and vertica-etl is a Kerberized proxy user.

$ kinit vertica-etl -kt /home/dbadmin/vertica-etl.keytab
$ vsql -U vertica-etl

=> ALTER SESSION SET 
   HadoopImpersonationConfig = '[{"nameservice":"hadoopNS", "doAs":"Bob"}]';
=> COPY nation FROM 'hdfs:///user/bob/nation.dat';

In the following example, the current Vertica user (it doesn't matter who that is) uses a Hadoop delegation token. This token belongs to Alice, but you never specify the user name here. Instead, you use it to get the delegation token from Hadoop.

$ vsql -U dbuser1

=> ALTER SESSION SET 
   HadoopImpersonationConfig ='[{"nameservice":"hadoopNS","token":"JAAGZGJldGwxBmRiZXRsMQCKAWDXJgB9igFg-zKEfY4gao4BmhSJYtXiWqrhBHbbUn4VScNg58HWQxJXRUJIREZTIGRlbGVnYXRpb24RMTAuMjAuMTAwLjU0OjgwMjA"}]';
=> COPY nation FROM 'hdfs:///user/alice/nation.dat';

In the following example, "authority" specifies the (single) namenode on a Hadoop cluster that does not use high availability.

$ vsql -U dbuser1

=> ALTER SESSION SET 
   HadoopImpersonationConfig ='[{"authority":"hadoop1:50070", "doAs":"Stephanie"}]';
=> COPY nation FROM 'webhdfs://hadoop1:50070/user/stephanie/nation.dat';

To access data in Hive you need to specify two delegation tokens. The first, for a nameservice or authority, is for data access as usual. The second is for the HiveServer2 metadata for the schema. HiveServer2 requires a delegation token in webhdfs format. The schema name is the Hive schema you specify with CREATE HCATALOG SCHEMA.

$ vsql -U dbuser1 

-- set delegation token for user and HiveServer2
=> ALTER SESSION SET 
   HadoopImpersonationConfig='[
     {"nameservice":"hadoopNS","token":"JQAHcmVsZWFzZQdyZWxlYXNlAIoBYVJKrYSKAWF2VzGEjgmzj_IUCIrI9b8Dqu6awFTHk5nC-fHB8xsSV0VCSERGUyBkZWxlZ2F0aW9uETEwLjIwLjQyLjEwOTo4MDIw"}, 
     {"schema":"access","token":"UwAHcmVsZWFzZQdyZWxlYXNlL2hpdmUvZW5nLWc5LTEwMC52ZXJ0aWNhY29ycC5jb21AVkVSVElDQUNPUlAuQ09NigFhUkmyTooBYXZWNk4BjgETFKN2xPURn19Yq9tf-0nekoD51TZvFUhJVkVfREVMRUdBVElPTl9UT0tFThZoaXZlc2VydmVyMkNsaWVudFRva2Vu"}]';

-- uses HiveServer2 token to get metadata
=> CREATE HCATALOG SCHEMA access WITH hcatalog_schema 'access';
 
-- uses both tokens
=> SELECT * FROM access.t1;
 
--uses only HiveServer2 token
=> SELECT * FROM hcatalog_tables;

Each object in the HadoopImpersonationConfig collection specifies one connection to one Hadoop cluster. You can add as many connections as you like, including to more than one Hadoop cluster. The following example shows delegation tokens for two different Hadoop clusters. Vertica uses the correct token for each cluster when connecting.

$ vsql -U dbuser1

=> ALTER SESSION SET 
   HadoopImpersonationConfig ='[
	{"nameservice":"productionNS","token":"JAAGZGJldGwxBmRiZXRsMQCKAWDXJgB9igFg-zKEfY4gao4BmhSJYtXiWqrhBHbbUn4VScNg58HWQxJXRUJIREZTIGRlbGVnYXRpb24RMTAuMjAuMTAwLjU0OjgwMjA"},
	{"nameservice":"testNS", "token":"HQAHcmVsZWFzZQdyZWxlYXNlAIoBYVJKrYSKAWF2VzGEjgmzj_IUCIrI9b8Dqu6awFTHk5nC-fHB8xsSV0VCSERGUyBkZWxlZ2F0aW9uETEwLjIwLjQyLjEwOTo4MDIw"}]';

=> COPY clicks FROM 'hdfs://productionNS/data/clickstream.dat';
=> COPY testclicks FROM 'hdfs://testNS/data/clickstream.dat';