Column Definitions

Following the file signature, the file must define the widths of each column in the file as follows.

Byte Offset Length (bytes) Description Comments

11

4

Header area length

32-bit integer in little-endian format that contains the length in bytes of remaining in the header, not including itself. This is the number of bytes from the end of this value to the start of the row data.

15

2

NATIVE file version

16-bit integer in little-endian format containing the version number of the NATIVE file format. The only valid value is currently 1. Future changes to the format could be assigned different version numbers to maintain backward compatibility.

17

1

Filler

Always 0.

18

2

Number of columns

16-bit integer in little-endian format that contains the number of columns in each row in the file.

20+

4 bytes for each column of data in the table

Column widths

Array of 32-bit integers in little-endian format that define the width of each column in the row. Variable-width columns have a value of -1 (0xFF 0xFF 0xFF 0xFF).

All integers in NATIVE files are in little-endian format (least significant byte first).

The width of each column is determined by the data type it contains. The following table explains the column width needed for each data type, along with the data encoding.

Data Type Length (bytes) Column Content

INTEGER

1, 2, 4, 8

8-, 16-, 32-, and 64-bit integers are supported. All multi-byte values are stored in little-endian format.

Note: All values for a column must be the width you specify here. If you set the length of an INTEGER column to be 4 bytes, then all of the values you supply for that column must be 32-bit integers.

BOOLEAN

1

0 for false, 1 for true.

FLOAT

8

Encoded in IEEE-754 format.

CHAR

User-specified

  • Strings shorter than the specified length must be right-padded with spaces (E'\040').
  • Strings are not null-terminated.
  • Character encoding is UTF-8.
  • UTF-8 strings can contain multi-byte characters. Therefore, number of characters in the string may not equal the number of bytes.

VARCHAR

4-byte integer (length) + data

The column width for a VARCHAR column is always -1 to signal that it contains variable-length data.

  • Each VARCHAR column value starts with a 32-bit integer that contains the number of bytes in the string.
  • The string must not be null-terminated.
  • Character encoding must be UTF-8.
  • Remember that UTF-8 strings can contain multi-byte characters. Therefore, number of characters in the string may not equal the number of bytes.

DATE

8

64-bit integer in little-endian format containing the Julian day since Jan 01 2000 (J2451545)

TIME

8

64-bit integer in little-endian format containing the number of microseconds since midnight in the UTC time zone.

TIMETZ

8

64-bit value where

  • Upper 40 bits contain the number of microseconds since midnight.
  • Lower 24 bits contain time zone as the UTC offset in microseconds calculated as follows: Time zone is logically from -24hrs to +24hrs from UTC. Instead it is represented here as a number between 0hrs to 48hrs. Therefore, 24hrs should be added to the actual time zone to calculate it.

Each portion is stored in little-endian format (5 bytes followed by 3 bytes).

TIMESTAMP

8

64-bit integer in little-endian format containing the number of microseconds since Julian day: Jan 01 2000 00:00:00.

TIMESTAMPTZ

8

A 64-bit integer in little-endian format containing the number of microseconds since Julian day: Jan 01 2000 00:00:00 in the UTC timezone.

INTERVAL

8

64-bit integer in little-endian format containing the number of microseconds in the interval.

BINARY

User-specified

Similar to CHAR. The length should be specified in the file header in the Field Lengths entry for the field. The field in the record must contain length number of bytes. If the value is smaller than the specified length, the remainder should be filled with nulls (E'\000').

VARBINARY

4-byte integer + data

Stored just like VARCHAR but data is interpreted as bytes rather than UTF-8 characters.

NUMERIC

(precision, scale) (precision ¸ 19 + 1) ´ 8 rounded up

A constant-length data type. Length is determined by the precision, assuming that a 64-bit unsigned integer can store roughly 19 decimal digits. The data consists of a sequence of 64-bit integers, each stored in little-endian format, with the most significant integer first. Data in the integers is stored in base 264. 2's complement is used for negative numbers.

If there is a scale, then the numeric is stored as numeric ´ 10scale; that is, all real numbers are stored as integers, ignoring the decimal point. It is required that the scale matches that of the target column in the dataanchor table. Another option is to use FILLER columns to coerce the numeric to the scale of the target column.