HDFS URL Format

You specify the location of a file in HDFS using a URL. In most cases, you use the hdfs:/// URL prefix (three slashes), and then specify the file path. The hdfs scheme uses the Libhdfs++ library to read files and is more efficient than WebHDFS.

The following example loads data stored in HDFS.

=> COPY t FROM 'hdfs:///opt/data/file1.dat';

Vertica uses the fs.defaultFS Hadoop configuration parameter to find the Name Node, which it uses to access the data. You can instead specify a host and port explicitly using the following format: hdfs://host:port/. The specified host is the Name Node, not an individual data node. If you are using High Availability (HA) Name Nodes you should not use an explicit host because high availability is provided through nameservices instead.

Your HDFS cluster might use High Availability Name Nodes or define nameservices. If so, you should use the nameservice instead of the host and port, in the format hdfs://nameservice/. The nameservice you specify must be defined in hdfs-site.xml.

The following example shows how you can use a nameservice, hadoopNS, with the hdfs scheme.

=> CREATE EXTERNAL TABLE tt (a1 INT, a2 VARCHAR(20))
	AS COPY FROM 'hdfs://hadoopNS/data/file.csv';

If you are using Vertica to access data from more than one HDFS cluster, always use explicit nameservices or hosts in the URL. Using hdfs:/// could produce unexpected results because Vertica uses the first value of fs.defaultFS that it finds. To access multiple HDFS clusters, you must use host and service names that are globally unique. See Configuring the hdfs Scheme for more information.

Note: All characters in URLs that are not a–z, A–Z, 0–9, '-', '.', '_' or '~' must be converted to URL encoding (%NN where NN is a two-digit hexadecimal number). For example, use %20 for space.

 


Was this topic helpful?