Loading Data From Amazon S3 Using MC

You can use the Data Load Activity page in Management Console to import data from Amazon S3 storage to an existing table in Vertica.

When you use Amazon Web Services (AWS), MCuses the Vertica library for AWS to import data directly from Amazon S3 storage to Vertica. You do not need to use any third-party scripts or programs. When you run a loading job, Vertica appends rows to the target table you provide. If the job fails, or you cancel the job, Vertica commits no rows to the target table.

When you view your load history on the Instance tab, loading jobs initiated in MC using Amazon S3 have the name MC_S3_Load in the Stream Name column.

For more information about loading data from Amazon S3 storage to Vertica, see Export Data From Amazon S3 Using the AWS Library.

Prerequisites

To use the Load feature in Management Console, you must first have:

  • Access to an Amazon S3 storage account.
  • An existing table in your Vertica database to which you can copy your data. You must be the owner of the table.
  • (For non-CloudFormation Template installs) An S3 gateway endpoint.

If you aren't using a CloudFormation Template (CFT) to install Vertica, you must create an S3 gateway endpoint in your VPC. For more information, see the AWS documentation.

For example, the Vertica CFT has the following VPC endpoint:

"S3Enpoint" : {
    "Type" : "AWS::EC2::VPCEndpoint",
    "Properties" : {
    "PolicyDocument" : {
        "Version":"2012-10-17",
        "Statement":[{
        "Effect":"Allow",
        "Principal": "*",
        "Action":["*"],
        "Resource":["*"]
        }]
    },
    "RouteTableIds" : [ {"Ref" : "RouteTable"} ],
    "ServiceName" : { "Fn::Join": [ "", [ "com.amazonaws.", { "Ref": "AWS::Region" }, ".s3" ] ] },
    "VpcId" : {"Ref" : "VPC"}
} 

Create a Loading Job

To load data from an Amazon S3 bucket to an existing table in your target database:

  1. On your target database's Management Console (MC) dashboard, click the Load tab at the bottom of the page to view the Data Load Activity page.
  2. Click the Instance tab.
  3. Click New S3 Data Load at the top-right of the tab. The Create New Amazon S3 Loading Job dialog box opens.
  4. Enter your AWS account credentials and your target location information in the required fields, which are indicated by asterisks (*). Use the format S3:// for the bucket name.
  5. (Optional) Specify additional options by completing the following fields:
    • Direct
    • COPY Parameters
    • Capture rejected data in a table
    • Reject max

    For more about using these fields, see About Configuring a Data Load from S3.

Cancel an Initiated Loading Job

If a loading job is in progress, you can cancel it using the Cancel option in the Load History tab's Cancel column. Click Cancel to cancel the loading job. When you cancel a job, Vertica rolls back all rows and does not commit any data to the target table.

See Also