Testing Your Hadoop WebHDFS Configuration
To ensure that your Hadoop installation's WebHDFS system is configured and running, follow these steps:
-
Log into your Hadoop cluster and locate a small text file on the Hadoop filesystem. If you do not have a suitable file, you can create a file named
test.txt
in the/tmp
directory using the following command:echo -e "A|1|2|3\nB|4|5|6" | hadoop fs -put - /tmp/test.txt
- Log into a host in your Vertica database using the database administrator account.
-
If you are using Kerberos authentication, authenticate with the Kerberos server using the keytab file for a user who is authorized to access the file. For example, to authenticate as an user named exampleuser@MYCOMPANY.COM, use the command:
$
kinit exampleuser@MYCOMPANY.COM -k -t /
path
/exampleuser.keytab
Where path is the path to the keytab file you copied over to the node. You do not receive any message if you authenticate successfully. You can verify that you are authenticated by using the klist command:
$
klist
Ticket cache: FILE:/tmp/krb5cc_500 Default principal: exampleuser@MYCOMPANY.COM Valid starting Expires Service principal 07/24/13 14:30:19 07/25/13 14:30:19 krbtgt/MYCOMPANY.COM@MYCOMPANY.COM renew until 07/24/13 14:30:19 -
Test retrieving the file:
-
If you are not using Kerberos authentication, run the following command from the Linux command line:
curl -i -L "http://hadoopNameNode:50070/webhdfs/v1/tmp/test.txt?op=OPEN&user.name=hadoopUserName"
Replacing hadoopNameNode with the hostname or IP address of the name node in your Hadoop cluster, /tmp/test.txt with the path to the file in the Hadoop filesystem you located in step 1, and hadoopUserName with the user name of a Hadoop user that has read access to the file.
If successful, the command produces output similar to the following:
HTTP/1.1 200 OKServer: Apache-Coyote/1.1 Set-Cookie: hadoop.auth="u=hadoopUser&p=password&t=simple&e=1344383263490&s=n8YB/CHFg56qNmRQRTqO0IdRMvE="; Version=1; Path=/ Content-Type: application/octet-stream Content-Length: 16 Date: Tue, 07 Aug 2012 13:47:44 GMT A|1|2|3 B|4|5|6
-
If you are using Kerberos authentication, run the following command from the Linux command line:
curl --negotiate -i -L -u:anyUser http://hadoopNameNode:50070/webhdfs/v1/tmp/test.txt?op=OPEN
Replace hadoopNameNode with the hostname or IP address of the name node in your Hadoop cluster, and /tmp/test.txt with the path to the file in the Hadoop filesystem you located in step 1.
If successful, the command produces output similar to the following:
HTTP/1.1 401 UnauthorizedContent-Type: text/html; charset=utf-8 WWW-Authenticate: Negotiate Content-Length: 0 Server: Jetty(6.1.26) HTTP/1.1 307 TEMPORARY_REDIRECT Content-Type: application/octet-stream Expires: Thu, 01-Jan-1970 00:00:00 GMT Set-Cookie: hadoop.auth="u=exampleuser&p=exampleuser@MYCOMPANY.COM&t=kerberos& e=1375144834763&s=iY52iRvjuuoZ5iYG8G5g12O2Vwo=";Path=/ Location: http://hadoopnamenode.mycompany.com:1006/webhdfs/v1/user/release/docexample/test.txt? op=OPEN&delegation=JAAHcmVsZWFzZQdyZWxlYXNlAIoBQCrfpdGKAUBO7CnRju3TbBSlID_osB658jfGf RpEt8-u9WHymRJXRUJIREZTIGRlbGVnYXRpb24SMTAuMjAuMTAwLjkxOjUwMDcw&offset=0 Content-Length: 0 Server: Jetty(6.1.26) HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 16 Server: Jetty(6.1.26) A|1|2|3 B|4|5|6
-
If the curl command fails, you must review the error messages and resolve any issues before using the Vertica Connector for HDFS with your Hadoop cluster. Some debugging steps include:
- Verify the HDFS service's port number.
- Verify that the Hadoop user you specified exists and has read access to the file you are attempting to retrieve.