Loading Binary (Native) Data

You can load binary data using the NATIVE parser option, except with COPY LOCAL, which does not support this option. Since binary-format data does not require the use and processing of delimiters, it precludes the need to convert integers, dates, and timestamps from text to their native storage format, and improves load performance over delimited data. All binary-format files must adhere to the formatting specifications described in Appendix: Creating Native Binary Format Files.

Native binary format data files are typically larger than their delimited text format counterparts, so compress the data before loading it. The NATIVE parser does not support concatenated compressed binary files. You can load native (binary) format files when developing plug-ins to ETL applications.

There is no copy format to load binary data byte-for-byte because the column and record separators in the data would have to be escaped. Binary data type values are padded and translated on input, and also in the functions, operators, and casts supported.

Loading Hexadecimal, Octal, and Bitstring Data

You can use the formats hexadecimal, octal, and bitstring only to load binary columns. To specify these column formats, use the COPY statement's FORMAT options:

  • Hexadecimal
  • Octal
  • Bitstring

The following examples illustrate how to use the FORMAT option.

  1. Create a table:

    => CREATE TABLE t(oct VARBINARY(5), 
         hex VARBINARY(5), 
         bitstring VARBINARY(5) );
    
  2. Create the projection:

    => CREATE PROJECTION t_p(oct, hex, bitstring) AS SELECT * FROM t;
  3. Use a COPY statement with the STDIN clause, specifying each of the formats:

    => COPY t (oct FORMAT 'octal', hex FORMAT 'hex', 
               bitstring FORMAT 'bitstring')
       FROM STDIN DELIMITER ',';
    
  4. Enter the data to load, ending the statement with a backslash (\) and a period (.) on a separate line:

    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 141142143144145,0x6162636465,0110000101100010011000110110010001100101
    >> \. 
    
  5. Use a select query on table t to view the input values results:

    => SELECT * FROM t;
     oct   | hex   | bitstring
    -------+-------+-----------
    abcde  | abcde | abcde
    (1 row)
    

COPY uses the same default format to load binary data, as used to input binary data. Since the backslash character ('\') is the default escape character, you must escape octal input values. For example, enter the byte '\141' as '\\141'.

If you enter an escape character followed by an invalid octal digit or an escape character being escaped, COPY returns an error.

On input, COPY translates string data as follows:

  • Uses the HEX_TO_BINARY function to translate from hexadecimal representation to binary.
  • Uses the BITSTRING_TO_BINARY function to translate from bitstring representation to binary.

Both functions take a VARCHAR argument and return a VARBINARY value.

You can also use the escape character to represent the (decimal) byte 92 by escaping it twice; for example, '\\\\'. Note that vsql inputs the escaped backslash as four backslashes. Equivalent inputs are hex value '0x5c' and octal value '\134' (134 = 1 x 8^2 + 3 x 8^1 + 4 x 8^0 = 92).

You can load a delimiter value if you escape it with a backslash. For example, given delimiter '|', '\\001\|\\002' is loaded as {1,124,2}, which can also be represented in octal format as '\\001\\174\\002'.

If you insert a value with more bytes than fit into the target column, COPY returns an error. For example, if column c1 is VARBINARY(1):

=> INSERT INTO t (c1) values ('ab');   ERROR: 2-byte value too long for type Varbinary(1)

If you implicitly or explicitly cast a value with more bytes than fit the target data type, COPY silently truncates the data. For example:

=> SELECT 'abcd'::binary(2);
 binary
--------
 ab
(1 row)

Hexadecimal Data

The optional '0x' prefix indicates that a value is hexadecimal, not decimal, although not all hexadecimal values use A-F; for example, 5396. COPY ignores the 0x prefix when loading the input data.

If there are an odd number of characters in the hexadecimal value, the first character is treated as the low nibble of the first (furthest to the left) byte.

Octal Data

Loading octal format data requires that each byte be represented by a three-digit octal code. The first digit must be in the range [0,3] and the second and third digits must both be in the range [0,7].

If the length of an octal value is not a multiple of three, or if one of the three digits is not in the proper range, the value is invalid and COPY rejects the row in which the value appears. If you supply an invalid octal value, COPY returns an error. For example:

SELECT '\\000\\387'::binary(8);
ERROR: invalid input syntax for type binary

Rows that contain binary values with invalid octal representations are also rejected. For example, COPY rejects '\\008' because '\\ 008' is not a valid octal number.

BitString Data

Loading bitstring data requires that each character must be zero (0) or one (1), in multiples of eight characters. If the bitstring value is not a multiple of eight characters, COPY treats the first n characters as the low bits of the first byte (furthest to the left), where n is the remainder of the value's length, divided by eight.

Examples

The following example shows VARBINARY HEX_TO_BINARY(VARCHAR) and VARCHAR TO_HEX(VARBINARY) usage.

  1. Create table t and and its projection with binary columns:

    => CREATE TABLE t (c BINARY(1));
    => CREATE PROJECTION t_p (c) AS SELECT c FROM t;
    
  2. Insert minimum and maximum byte values, including an IP address represented as a character string:

    => INSERT INTO t values(HEX_TO_BINARY('0x00'));
    => INSERT INTO t values(HEX_TO_BINARY('0xFF'));
    => INSERT INTO t values (V6_ATON('2001:DB8::8:800:200C:417A'));
    

    Use the TO_HEX function to format binary values in hexadecimal on output:

    => SELECT TO_HEX(c) FROM t;
     to_hex
    --------
     00
     ff
     20
    (3 rows)