Copying and Exporting Data on AWS: What You Need to Know

There are common issues that occur when exporting or copying on AWS clusters, as described below. Except for these specific issues as they relate to AWS, copying and exporting data works as documented in Copying Data Between Vertica Databases.

To copy or export data on AWS:

  1. Verify that all nodes in source and destination clusters have their own elastic IPs (or public IPs) assigned.

    If your destination cluster is located within the same VPC as your source cluster, proceed to step 3. Each node in one cluster must be able to communicate with each node in the other cluster. Thus, each source and destination node needs an elastic IP (or public IP) assigned.

  2. (For non-CloudFormation Template installs) Create an S3 gateway endpoint.
  3. If you aren't using a CloudFormation Template (CFT) to install Vertica, you must create an S3 gateway endpoint in your VPC. For more information, see the AWS documentation.

    For example, the Vertica CFT has the following VPC endpoint:

    "S3Enpoint" : {
        "Type" : "AWS::EC2::VPCEndpoint",
        "Properties" : {
        "PolicyDocument" : {
            "Version":"2012-10-17",
            "Statement":[{
            "Effect":"Allow",
            "Principal": "*",
            "Action":["*"],
            "Resource":["*"]
            }]
        },
        "RouteTableIds" : [ {"Ref" : "RouteTable"} ],
        "ServiceName" : { "Fn::Join": [ "", [ "com.amazonaws.", { "Ref": "AWS::Region" }, ".s3" ] ] },
        "VpcId" : {"Ref" : "VPC"}
    } 
  4. Verify that your security group allows the AWS clusters to communicate.

    Check your security groups for both your source and destination AWS clusters. Verify that ports 5433 and 5434 are open. If one of your AWS clusters is on a separate VPC, verify that your network access control list (ACL) allows communication on port 5434.

    Note:

    This communication method exports and copies (imports) data across the Internet. You can alternatively use non-public IPs and gateways, or VPN to connect the source and destination clusters.

  1. If there are one or more elastic load balancers (ELBs) between the clusters, verify that port 5433 is open between the ELBs and clusters.

  2. If you use the Vertica client to connect to one or more ELBs, the ELBs only distribute incoming connections. The data transmission path occurs between clusters.