Integrating Vertica with the MapR Distribution of Hadoop

MapR is a distribution of Apache Hadoop produced by MapR Technologies that extends the standard Hadoop components with its own features. Vertica can integrate with MapR in the following ways:

Other Vertica integrations for Hadoop are not available for MapR.

For information on mounting the MapR file system as an NFS mount point, see Accessing Data with NFS and Configuring Vertica Analytics Platform with MapR on the MapR website. In particular, you must configure MapR to add Vertica as a MapR service.

Examples

In the following examples, the MapR file system has been mounted as /mapr.

The following statement creates an external table from ORC data:

=> CREATE EXTERNAL TABLE t (a1 INT, a2 VARCHAR(20))
   AS COPY FROM '/mapr/data/file.orc' ORC;

The following statement creates an external table from Parquet data and takes advantage of partition pruning (see Using Partition Columns):

=> CREATE EXTERNAL TABLE t2 (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM '/mapr/*/*/*' PARQUET(hive_partition_cols='created,region');

The following statement loads ORC data from MapR into Vertica:

=> COPY t FROM '/mapr/data/*.orc' ON ANY NODE ORC;

The following statement creates a storage location to hold ROS data in the MapR file system:

=> CREATE LOCATION '/mapr/my.cluster.com/data' SHARED USAGE 'DATA' LABEL 'maprfs';