Data Ingestion#

VerticaPy supports the ingestion of data in the following formats into Vertica: - CSV - JSON - Parquet - SHP - ORC - Avro - pandas DataFrame

You can use the `read_file() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.read_file.html>`__ function to ingest all the above file types except pandas DataFrames and SHP, which instead use file-specific ingestion functions—`read_pandas() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.read_pandas.html>`__ and `read_shp() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.read_shp.html>`__. There are also file-specifc ingestion functions for JSON, Avro, and CSV files that use flex tables to ingest the data.

Unless you specify the columns’ data types with the dtype parameter, the ingestion functions automatically predict the data type of each column. If you provide the column data types, the function does not need to parse the file and predict data types, likely increasing ingestion speed and precision. For the read_file() function, if the file to ingest is located in the Vertica database, you must provide the column data types with the dtype parameter.

NOTE: As performance optimizations made in the Vertica database are carried over to VerticaPy, try to optimize the structure of your projections for new tables in Vertica.

In the following sections, we will explore a few of the ingestion functions and some of the options they support.

Ingest files with read_file#

The read_file() function inspects and ingests files in any of the following formats: - CSV - Parquet - ORC - JSON - Avro

Some of the supported function options include: - dtype: provide a dictionary of data types for the columns, where the keys are the column names and the values are the column data types. The data types in the dictionary replace the automatically predicted data types. - insert: if set to True, the data is ingested into the relation specified by the table_name and, optionally, the schema paraemters. - table_name: specifies the name of the table to create in the Vertica database, or the name of the table to which the data is inserted. - temporary_table: if set to True, creates a temporary table. - ingest_local: if set to True, the file is ingested from the local machine. In this case, the dtypes parameter is optional; if no value is provided, the function predicts the data type of each column. - genSQL: if set to True, the function generates the SQL code the creates the table but does not execute it. This is a convenient way to check the final relation types.

For a full list of supported options, see the documentation or use the help() function.

NOTE: All data files used in this tutorial are availble in the VerticaPy datasets directory. For demo purposes, the following examples ingest the data files using read_file() and other file-specific read functions. However, VerticaPy includes a set of dataset loading functions that allow you to easily ingest the data files in the datasets directory.

In the following examples, we will demonstrate how to use the read_file() function to ingest data into Vertica. Both file location options, in-database and local, will be explored.

Let’s begin with the case where the file is located in the database. We’ll ingest the iris.csv file, a popular classification dataset. First, before we ingest the file, run the function with the genSQL parameter set to True to view the SQL that will be used to create the table. Because the file is located in the database, we must specify the data types for each column with the dtypes parameter:

NOTE: For the examples in this tutorial, replace path-to-file in the path parameter with the path to the file in your Vertica database or local machine.

[2]:
import verticapy as vp

vp.read_file(
    path = "path-to-file/iris.csv",
    dtype = {
        "Id": "Integer",
        "SepalLengthCm": "Numeric",
        "SepalWidthCm": "Numeric",
        "PetalLengthCm": "Numeric",
        "PetalWidthCm": "Numeric",
        "Species": "Varchar(20)",
    },
    schema = "public",
    genSQL = True,
)
[2]:
['CREATE TABLE "public"."iris"("Id" Integer, "SepalLengthCm" Numeric, "SepalWidthCm" Numeric, "PetalLengthCm" Numeric, "PetalWidthCm" Numeric, "Species" Varchar(20))',
 'COPY "public"."iris"("Id", "SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm", "Species") FROM \'/home/dbadmin/verticapy/VerticaPy/verticapy/datasets/data/iris.csv\' UNCOMPRESSED DELIMITER \',\' NULL \'\' ENCLOSED BY \'"\' ESCAPE AS \'\x17\' SKIP 1']

To ingest the file into Vertica, remove the genSQL parameter from the above command and rerun the function:

NOTE: If no table name is specified, the name of the file is used for the table name:

[3]:
vp.read_file(
    path = "path-to-file/iris.csv",
    dtype = {
        "Id": "Integer",
        "SepalLengthCm": "Numeric",
        "SepalWidthCm": "Numeric",
        "PetalLengthCm": "Numeric",
        "PetalWidthCm": "Numeric",
        "Species": "Varchar(20)",
    },
    schema = "public",
)
The table "public"."iris" has been successfully created.
[3]:
123
Id
Integer
123
SepalLengthCm
Numeric(39)
123
SepalWidthCm
Numeric(39)
123
PetalLengthCm
Numeric(39)
123
PetalWidthCm
Numeric(39)
Abc
Species
Varchar(20)
1455.13.81.90.4Iris-setosa
2576.33.34.71.6Iris-versicolor
3596.62.94.61.3Iris-versicolor
4715.93.24.81.8Iris-versicolor
5905.52.54.01.3Iris-versicolor
61025.82.75.11.9Iris-virginica
71037.13.05.92.1Iris-virginica
81155.82.85.12.4Iris-virginica
91197.72.66.92.3Iris-virginica
101225.62.84.92.0Iris-virginica
111327.93.86.42.0Iris-virginica
121416.73.15.62.4Iris-virginica
131505.93.05.11.8Iris-virginica
1415.13.51.40.2Iris-setosa
1524.93.01.40.2Iris-setosa
1655.03.61.40.2Iris-setosa
1774.63.41.40.3Iris-setosa
1885.03.41.50.2Iris-setosa
19104.93.11.50.1Iris-setosa
20155.84.01.20.2Iris-setosa
21165.74.41.50.4Iris-setosa
22275.03.41.60.4Iris-setosa
23325.43.41.50.4Iris-setosa
24495.33.71.50.2Iris-setosa
25636.02.24.01.0Iris-versicolor
26655.62.93.61.3Iris-versicolor
27666.73.14.41.4Iris-versicolor
28685.82.74.11.0Iris-versicolor
29855.43.04.51.5Iris-versicolor
30886.32.34.41.3Iris-versicolor
31975.72.94.21.3Iris-versicolor
321267.23.26.01.8Iris-virginica
331336.42.85.62.2Iris-virginica
34175.43.91.30.4Iris-setosa
35424.52.31.30.3Iris-setosa
36526.43.24.51.5Iris-versicolor
37646.12.94.71.4Iris-versicolor
38696.22.24.51.5Iris-versicolor
39766.63.04.41.4Iris-versicolor
40965.73.04.21.2Iris-versicolor
411046.32.95.61.8Iris-virginica
421096.72.55.81.8Iris-virginica
431126.42.75.31.9Iris-virginica
441145.72.55.02.0Iris-virginica
451237.72.86.72.0Iris-virginica
461256.73.35.72.1Iris-virginica
471456.73.35.72.5Iris-virginica
481476.32.55.01.9Iris-virginica
49144.33.01.10.1Iris-setosa
50234.63.61.00.2Iris-setosa
51245.13.31.70.5Iris-setosa
52304.73.21.60.2Iris-setosa
53475.13.81.60.2Iris-setosa
54484.63.21.40.2Iris-setosa
55675.63.04.51.5Iris-versicolor
56786.73.05.01.7Iris-versicolor
57805.72.63.51.0Iris-versicolor
58825.52.43.71.0Iris-versicolor
591435.82.75.11.9Iris-virginica
6065.43.91.70.4Iris-setosa
61124.83.41.60.2Iris-setosa
62205.13.81.50.3Iris-setosa
63384.93.11.50.1Iris-setosa
64405.13.41.50.2Iris-setosa
65445.03.51.60.6Iris-setosa
66536.93.14.91.5Iris-versicolor
67705.62.53.91.1Iris-versicolor
68796.02.94.51.5Iris-versicolor
69846.02.75.11.6Iris-versicolor
70866.03.44.51.6Iris-versicolor
711005.72.84.11.3Iris-versicolor
721016.33.36.02.5Iris-virginica
731067.63.06.62.1Iris-virginica
741074.92.54.51.7Iris-virginica
751087.32.96.31.8Iris-virginica
761136.83.05.52.1Iris-virginica
771246.32.74.91.8Iris-virginica
781466.73.05.22.3Iris-virginica
7944.63.11.50.2Iris-setosa
80134.83.01.40.1Iris-setosa
81254.83.41.90.2Iris-setosa
82265.03.01.60.2Iris-setosa
83545.52.34.01.3Iris-versicolor
84556.52.84.61.5Iris-versicolor
85565.72.84.51.3Iris-versicolor
86625.93.04.21.5Iris-versicolor
87876.73.14.71.5Iris-versicolor
88915.52.64.41.2Iris-versicolor
89945.02.33.31.0Iris-versicolor
901176.53.05.51.8Iris-virginica
911276.22.84.81.8Iris-virginica
921307.23.05.81.6Iris-virginica
931406.93.15.42.1Iris-virginica
941426.93.15.12.3Iris-virginica
9594.42.91.40.2Iris-setosa
96185.13.51.40.3Iris-setosa
97195.73.81.70.3Iris-setosa
98335.24.11.50.1Iris-setosa
99354.93.11.50.1Iris-setosa
100464.83.01.40.3Iris-setosa
Rows: 1-100 | Columns: 6

When the file to ingest is located on your local machine, you must set the ingest_local parameter to True:

NOTE: In some cases where the CSV file has a very complex structure, local ingestion might fail. If this occurs, you will have to move the file into the database and then ingest the file from that location.

[5]:
vp.read_file(
    path = "path-to-file/iris.csv",
    schema = "public",
    table_name = "iris_local",
    ingest_local = True,
)
The table "public"."iris_local" has been successfully created.
[5]:
123
Id
Integer
123
SepalLengthCm
Numeric(7)
123
SepalWidthCm
Numeric(7)
123
PetalLengthCm
Numeric(7)
123
PetalWidthCm
Numeric(7)
Abc
Species
Varchar(30)
1225.13.71.50.4Iris-setosa
2335.24.11.50.1Iris-setosa
3375.53.51.30.2Iris-setosa
4424.52.31.30.3Iris-setosa
5576.33.34.71.6Iris-versicolor
6685.82.74.11.0Iris-versicolor
7726.12.84.01.3Iris-versicolor
8756.42.94.31.3Iris-versicolor
91074.92.54.51.7Iris-virginica
101087.32.96.31.8Iris-virginica
111505.93.05.11.8Iris-virginica
1294.42.91.40.2Iris-setosa
13245.13.31.70.5Iris-setosa
14285.23.51.50.2Iris-setosa
15314.83.11.60.2Iris-setosa
16345.54.21.40.2Iris-setosa
17384.93.11.50.1Iris-setosa
18696.22.24.51.5Iris-versicolor
19766.63.04.41.4Iris-versicolor
20825.52.43.71.0Iris-versicolor
21926.13.04.61.4Iris-versicolor
22935.82.64.01.2Iris-versicolor
231046.32.95.61.8Iris-virginica
241166.43.25.32.3Iris-virginica
251187.73.86.72.2Iris-virginica
261216.93.25.72.3Iris-virginica
271286.13.04.91.8Iris-virginica
281466.73.05.22.3Iris-virginica
29124.83.41.60.2Iris-setosa
30445.03.51.60.6Iris-setosa
31455.13.81.90.4Iris-setosa
32565.72.84.51.3Iris-versicolor
33605.22.73.91.4Iris-versicolor
34675.63.04.51.5Iris-versicolor
35786.73.05.01.7Iris-versicolor
36866.03.44.51.6Iris-versicolor
37955.62.74.21.3Iris-versicolor
38965.73.04.21.2Iris-versicolor
39986.22.94.31.3Iris-versicolor
401056.53.05.82.2Iris-virginica
411197.72.66.92.3Iris-virginica
421416.73.15.62.4Iris-virginica
4365.43.91.70.4Iris-setosa
44144.33.01.10.1Iris-setosa
45195.73.81.70.3Iris-setosa
46325.43.41.50.4Iris-setosa
47365.03.21.20.2Iris-setosa
48636.02.24.01.0Iris-versicolor
49835.82.73.91.2Iris-versicolor
50876.73.14.71.5Iris-versicolor
511176.53.05.51.8Iris-virginica
521267.23.26.01.8Iris-virginica
531327.93.86.42.0Iris-virginica
541356.12.65.61.4Iris-virginica
551367.73.06.12.3Iris-virginica
561386.43.15.51.8Iris-virginica
5715.13.51.40.2Iris-setosa
5824.93.01.40.2Iris-setosa
5944.63.11.50.2Iris-setosa
60134.83.01.40.1Iris-setosa
61165.74.41.50.4Iris-setosa
62175.43.91.30.4Iris-setosa
63304.73.21.60.2Iris-setosa
64475.13.81.60.2Iris-setosa
65746.12.84.71.2Iris-versicolor
661025.82.75.11.9Iris-virginica
671136.83.05.52.1Iris-virginica
681145.72.55.02.0Iris-virginica
691155.82.85.12.4Iris-virginica
701276.22.84.81.8Iris-virginica
711296.42.85.62.1Iris-virginica
721317.42.86.11.9Iris-virginica
731406.93.15.42.1Iris-virginica
741426.93.15.12.3Iris-virginica
75155.84.01.20.2Iris-setosa
76185.13.51.40.3Iris-setosa
77215.43.41.70.2Iris-setosa
78354.93.11.50.1Iris-setosa
79405.13.41.50.2Iris-setosa
80584.92.43.31.0Iris-versicolor
81596.62.94.61.3Iris-versicolor
82615.02.03.51.0Iris-versicolor
83646.12.94.71.4Iris-versicolor
84776.82.84.81.4Iris-versicolor
85805.72.63.51.0Iris-versicolor
86886.32.34.41.3Iris-versicolor
87915.52.64.41.2Iris-versicolor
881225.62.84.92.0Iris-virginica
891237.72.86.72.0Iris-virginica
901346.32.85.11.5Iris-virginica
911486.53.05.22.0Iris-virginica
9285.03.41.50.2Iris-setosa
93115.43.71.50.2Iris-setosa
94234.63.61.00.2Iris-setosa
95265.03.01.60.2Iris-setosa
96295.23.41.40.2Iris-setosa
97505.03.31.40.2Iris-setosa
98536.93.14.91.5Iris-versicolor
99545.52.34.01.3Iris-versicolor
100625.93.04.21.5Iris-versicolor
Rows: 1-100 | Columns: 6

To ingest multiple files of the same type, use the following syntax in the path parameter (in this case for multiple CSV files): path = "``path-to-files``/*.csv"

Ingest CSV files#

In addition to read_file(), you can also ingest CSV files with the read_csv() function, which ingests the file using flex tables. This function provides options not available in read_file(), such as: - sep: specify the column separator. - parse_nrows: the function creates a file of nrows from the data file to identify the data types. This file is then dropped and the entire data file is ingested. If your data file is large, this data type inference process should speed up the file ingestion speed. - materialize: if set to True, the flex table used to ingest the data file is materialized into a table; otherwise, the data remains in a flex table.

For a full list of supported options, see the read_csv() reference page or use the help() function.

In the following example, we will use read_csv() to ingest a subset of the Titanic dataset. To begin, load the entire Titanic dataset using the load_titanic() function:

[6]:
from verticapy.datasets import load_titanic

titanic = load_titanic()

To convert a subset of the dataset to a CSV file, select the desired rows in the dataset and use the `to_csv() <https://www.vertica.com/python/documentation/1.0.x/html/verticapy.vDataFrame.to_csv.html>`__ vDataFrame method:

[7]:
titanic[0:50].to_csv(
    path = "titanic_subset.csv"
)

Before ingesting the above CSV file, we can check its columns and their data types with the `pcsv() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.pcsv.html>`__ function:

[8]:
vp.pcsv(path = "titanic_subset.csv",
    sep = ",",
    na_rep = "",
)
[8]:
{'sibsp': 'Integer',
 'sex': 'Varchar(20)',
 'pclass': 'Integer',
 'parch': 'Integer',
 'name': 'Varchar(112)',
 'fare': 'Numeric(11,6)',
 'embarked': 'Varchar(20)',
 'body': 'Integer',
 'age': 'Numeric(8,4)',
 'boat': 'Varchar(100)',
 'ticket': 'Varchar(24)',
 'survived': 'Integer',
 'home.dest': 'Varchar(92)',
 'cabin': 'Varchar(22)'}

Now, setting the ingest_local parameter to True, ingest the CSV file into the Vertica database:

[9]:
vp.read_csv("titanic_subset.csv",
    schema = "public",
    table_name = "titanic_subset",
    sep = ",",
    ingest_local = True
)
The table "public"."titanic_subset" has been successfully created.
[9]:
123
pclass
Integer
123
survived
Integer
Abc
Varchar(112)
Abc
sex
Varchar(20)
123
age
Numeric(10)
123
sibsp
Integer
123
parch
Integer
Abc
ticket
Varchar(24)
123
fare
Numeric(13)
Abc
cabin
Varchar(22)
Abc
embarked
Varchar(20)
Abc
boat
Varchar(100)
123
body
Integer
Abc
Varchar(92)
110male58.00235273113.275D48C[null]122
210male51.001PC 1759761.3792[null]C[null][null]
311female14.012113760120.0B96 B98S4[null]
411female[null]0017421110.8833[null]C4[null]
511male48.0101999652.0C126S5 7[null]
610male17.00011305947.1[null]S[null][null]
710male31.010F.C. 1275052.0B71S[null][null]
810male41.0101746451.8625D21S[null][null]
910male57.010PC 17569146.5208B78C[null][null]
1010male64.01011081375.25D37C[null][null]
1111female35.000113503211.5C130C4[null]
1211female33.00011015286.5B77S8[null]
1320male36.00022923613.0[null]S[null]236
1420male31.000C.A. 1872310.5[null]S[null]165
1510male24.000PC 1759379.2B86C[null][null]
1610male47.00011379642.4[null]S[null][null]
1711female43.00124160211.3375B3S2[null]
1811male35.000PC 1747526.2875E24S5[null]
1920male44.00024874613.0[null]S[null]35
2020male22.020C.A. 3102931.5[null]S[null][null]
2120male30.00025065313.0[null]S[null]75
2210male[null]00PC 1760030.6958[null]C14[null]
2311female53.0201176951.4792C101SD[null]
2411male36.001PC 17755512.3292B51 B53 B55C3[null]
2511female39.00024160211.3375[null]S2[null]
2611female60.01011081375.25D37C5[null]
2720male21.000S.O.C. 1487973.5[null]S[null][null]
2820male47.00023756515.0[null]S[null][null]
2910male27.01013508136.7792C89C[null][null]
3010male37.01011380353.1C123S[null][null]
3111female35.0103697383.475C83SD[null]
3211male30.0101323657.75C78C11[null]
3320female44.01024425226.0[null]S[null][null]
3410male[null]0011379126.55[null]S[null][null]
3510male[null]0011377826.55D34S[null][null]
3610male45.0103697383.475C83S[null][null]
3710male42.00011303842.5B11S[null][null]
3810male[null]0011302826.55C124S[null][null]
3911female36.002WE/P 573571.0B22S7[null]
4011female25.0101176555.4417E50C5[null]
4111female22.01011377666.6C2S8[null]
4220male29.010200326.0[null]S[null][null]
4310male71.000PC 1760949.5042[null]C[null]22
4410male29.00011350130.0D6S[null]126
4510male50.0101350755.9E44S[null][null]
4611female[null]0111350555.0E33S6[null]
4711female22.00111237859.4[null]C7[null]
4810male37.011PC 1775683.1583E52C[null][null]
4911female37.0101992890.0C78Q14[null]
5011female31.01035273113.275D36C6[null]
Rows: 1-50 | Columns: 14

If we want to insert additional data from the original Titanic dataset into the public.titanic_subset table, we can do so by setting the insert parameter of the read_csv() function to True:

TIP: You can also insert data into an existing Vertica table with the `insert_into() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.insert_into.html>`__ function.

[10]:
titanic[50:100].to_csv(
    path = "titanic_more_data.csv"
)

vp.read_csv("titanic_more_data.csv",
    schema = "public",
    table_name = "titanic_subset",
    sep = ",",
    insert = True
)
[10]:
123
pclass
Integer
123
survived
Integer
Abc
Varchar(112)
Abc
sex
Varchar(20)
123
age
Numeric(10)
123
sibsp
Integer
123
parch
Integer
Abc
ticket
Varchar(24)
123
fare
Numeric(13)
Abc
cabin
Varchar(22)
Abc
embarked
Varchar(20)
Abc
boat
Varchar(100)
123
body
Integer
Abc
Varchar(92)
110male[null]0011379126.55[null]S[null][null]
210male[null]0011377826.55D34S[null][null]
310male45.0103697383.475C83S[null][null]
410male42.00011303842.5B11S[null][null]
510male[null]0011302826.55C124S[null][null]
611female36.002WE/P 573571.0B22S7[null]
711female25.0101176555.4417E50C5[null]
811female22.01011377666.6C2S8[null]
920male29.010200326.0[null]S[null][null]
1010male71.000PC 1760949.5042[null]C[null]22
1110male29.00011350130.0D6S[null]126
1210male50.0101350755.9E44S[null][null]
1311female[null]0111350555.0E33S6[null]
1411female22.00111237859.4[null]C7[null]
1510male37.011PC 1775683.1583E52C[null][null]
1611female37.0101992890.0C78Q14[null]
1711female31.01035273113.275D36C6[null]
1820male27.01022841426.0[null]S[null]293
1930female18.001269114.4542[null]C[null][null]
2030male[null]0026647.225[null]C[null][null]
2130male31.0003350977.75[null]Q[null][null]
2230male[null]0026747.225[null]C[null][null]
2330male[null]0026317.225[null]C[null][null]
2430male40.500C.A. 621215.1[null]S[null]187
2530male26.0103500257.8542[null]S[null][null]
2621male3.01123008026.0F2SD[null]
2721female18.00225065213.0[null]S16[null]
2830male0.330234708014.4[null]S[null][null]
2930male30.000C 70767.25[null]S[null]72
3030female[null]003648597.75[null]Q[null][null]
3130female[null]003826497.75[null]Q[null][null]
3220male30.00025065313.0[null]S[null]75
3310male[null]00PC 1760030.6958[null]C14[null]
3411female53.0201176951.4792C101SD[null]
3511male36.001PC 17755512.3292B51 B53 B55C3[null]
3611female39.00024160211.3375[null]S2[null]
3711female60.01011081375.25D37C5[null]
3820male21.000S.O.C. 1487973.5[null]S[null][null]
3920male47.00023756515.0[null]S[null][null]
4010male27.01013508136.7792C89C[null][null]
4110male37.01011380353.1C123S[null][null]
4211female35.0103697383.475C83SD[null]
4311male30.0101323657.75C78C11[null]
4420female44.01024425226.0[null]S[null][null]
4521female34.00024388013.0[null]S12[null]
4630female40.01075469.475[null]S[null][null]
4730female28.01134708014.4[null]S[null][null]
4830male25.0003492037.8958[null]S[null][null]
4930male25.0003492507.8958[null]S[null][null]
5030male40.016CA 214446.9[null]S[null][null]
5130male51.000214408.05[null]S[null][null]
5220male36.000C.A. 1724810.5[null]S[null][null]
5321female48.01222084565.0[null]S9[null]
5430male35.0003734508.05[null]S[null][null]
5530male35.0003645128.05[null]S[null][null]
5630male34.01134708014.4[null]S[null]197
5720male62.0002402769.6875[null]Q[null][null]
5820male27.000SC/PARIS 216815.0333[null]C[null][null]
5921female24.00225064914.5[null]S4[null]
6030female22.000755210.5167[null]S[null][null]
6130male40.5003672327.75[null]Q[null]68
6210male58.00235273113.275D48C[null]122
6310male51.001PC 1759761.3792[null]C[null][null]
6411female14.012113760120.0B96 B98S4[null]
6511female[null]0017421110.8833[null]C4[null]
6611male48.0101999652.0C126S5 7[null]
6710male17.00011305947.1[null]S[null][null]
6810male31.010F.C. 1275052.0B71S[null][null]
6910male41.0101746451.8625D21S[null][null]
7010male57.010PC 17569146.5208B78C[null][null]
7110male64.01011081375.25D37C[null][null]
7211female35.000113503211.5C130C4[null]
7311female33.00011015286.5B77S8[null]
7420male36.00022923613.0[null]S[null]236
7520male31.000C.A. 1872310.5[null]S[null]165
7610male24.000PC 1759379.2B86C[null][null]
7710male47.00011379642.4[null]S[null][null]
7811female43.00124160211.3375B3S2[null]
7911male35.000PC 1747526.2875E24S5[null]
8020male44.00024874613.0[null]S[null]35
8120male22.020C.A. 3102931.5[null]S[null][null]
8220male70.000C.A. 2458010.5[null]S[null][null]
8320female27.0101166821.0[null]S[null][null]
8421female48.002C.A. 3311236.75[null]S14[null]
8521female3.012SC/Paris 212341.5792[null]C14[null]
8630female32.01136484915.5[null]Q[null][null]
8730male22.0003500457.7958[null]S[null][null]
8830male[null]10268914.4583[null]C[null][null]
8930male22.000A/5 211727.25[null]S[null][null]
9030male18.0003500367.7958[null]S[null][null]
9120male16.000S.O./P.P. 310.5[null]S[null][null]
9220male66.000C.A. 2457910.5[null]S[null][null]
9321female30.010SC/PARIS 214813.8583[null]C12[null]
9430male[null]003492387.8958[null]S[null][null]
9530male[null]003492257.8958[null]S[null][null]
9630male[null]003492547.8958[null]C[null][null]
9730male[null]003492207.8958[null]S[null][null]
9820male54.0002901114.0[null]S[null][null]
9921female2.0112636026.0[null]S11[null]
10030female[null]10268914.4583[null]C[null][null]
Rows: 1-100 | Columns: 14

Ingest JSON files#

As with CSV files, VerticaPy provides a file-specific ingestion function for JSON files, read_json(), which supports additional options, including: - usecols: provide a list of JSON parameters to ingest. Other JSON parameters are ignored. - start_point: name the key in the JSON load data at which to begin parsing - flatten_maps: set whether sub-maps within the JSON data are flattened. - materialize: if set to True, the flex table used to ingest the data is materialized into a table.

For a full list of supported options, see the read_json() reference page or use the help() function.

VerticaPy also provides a `pjson() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.pjson.html>`__ function to parse JSON files to identify columns and their respective data types.

In the following example, we load the iris dataset using the `load_iris() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.datasets.load_iris.html>`__ dataset, convert the vDataFrame to JSON format with the `to_json() <https://www.vertica.com/python/documentation/1.0.x/html/verticapy.vDataFrame.to_json.html>`__ method, then ingest the JSON file into Vetica:

[11]:
from verticapy.datasets import load_iris

iris = load_iris()
iris.to_json(
    path = "iris_local.json",
)
vp.read_json(
    path = "iris_local.json",
    table_name = "iris_ingest",
    schema = "public",
)
The table "public"."iris_ingest" has been successfully created.
[11]:
123
PetalWidthCm
Float(22)
123
SepalWidthCm
Float(22)
Abc
Species
Varchar(30)
123
PetalLengthCm
Float(22)
123
SepalLengthCm
Float(22)
123
Id
Integer
10.13.0Iris-setosa1.14.314
20.23.2Iris-setosa1.25.036
30.23.2Iris-setosa1.34.73
40.23.2Iris-setosa1.64.730
50.23.6Iris-setosa1.04.623
60.23.6Iris-setosa1.45.05
71.22.8Iris-versicolor4.76.174
81.23.0Iris-versicolor4.25.796
91.32.8Iris-versicolor4.55.756
101.42.6Iris-virginica5.66.1135
111.52.9Iris-versicolor4.56.079
121.82.8Iris-virginica4.86.2127
131.82.9Iris-virginica5.66.3104
141.83.0Iris-virginica4.86.0139
152.03.0Iris-virginica5.26.5148
162.03.2Iris-virginica5.16.5111
172.13.3Iris-virginica5.76.7125
180.22.9Iris-setosa1.44.49
190.23.4Iris-setosa1.45.229
200.23.4Iris-setosa1.94.825
210.23.7Iris-setosa1.55.349
220.23.7Iris-setosa1.55.411
230.24.2Iris-setosa1.45.534
240.43.8Iris-setosa1.95.145
251.02.7Iris-versicolor4.15.868
261.32.7Iris-versicolor4.25.695
271.32.9Iris-versicolor4.36.298
281.42.8Iris-versicolor4.86.877
291.43.0Iris-versicolor4.46.676
301.43.2Iris-versicolor4.77.051
311.52.5Iris-versicolor4.96.373
321.52.8Iris-virginica5.16.3134
331.53.0Iris-versicolor4.55.667
341.92.5Iris-virginica5.06.3147
352.13.0Iris-virginica5.56.8113
362.33.2Iris-virginica5.36.4116
370.13.1Iris-setosa1.54.938
380.23.0Iris-setosa1.65.026
390.23.4Iris-setosa1.55.08
400.23.4Iris-setosa1.75.421
410.33.0Iris-setosa1.44.846
421.02.3Iris-versicolor3.35.094
431.02.6Iris-versicolor3.55.780
441.12.4Iris-versicolor3.85.581
451.32.3Iris-versicolor4.05.554
461.43.0Iris-versicolor4.66.192
471.43.1Iris-versicolor4.46.766
481.52.8Iris-versicolor4.66.555
491.53.2Iris-versicolor4.56.452
501.63.0Iris-virginica5.87.2130
511.82.5Iris-virginica5.86.7109
522.02.8Iris-virginica6.77.7123
532.13.0Iris-virginica6.67.6106
542.33.0Iris-virginica6.17.7136
552.33.1Iris-virginica5.16.9142
562.43.1Iris-virginica5.66.7141
570.13.1Iris-setosa1.54.935
580.23.4Iris-setosa1.64.812
590.33.4Iris-setosa1.44.67
600.33.8Iris-setosa1.75.719
610.43.4Iris-setosa1.65.027
620.43.7Iris-setosa1.55.122
630.44.4Iris-setosa1.55.716
640.63.5Iris-setosa1.65.044
651.02.0Iris-versicolor3.55.061
661.02.2Iris-versicolor4.06.063
671.32.9Iris-versicolor4.36.475
681.42.7Iris-versicolor3.95.260
691.63.3Iris-versicolor4.76.357
701.83.2Iris-versicolor4.85.971
711.92.7Iris-virginica5.15.8102
722.02.5Iris-virginica5.05.7114
732.22.8Iris-virginica5.66.4133
742.53.3Iris-virginica6.06.3101
750.23.0Iris-setosa1.44.92
760.23.5Iris-setosa1.45.11
770.33.5Iris-setosa1.45.118
780.33.8Iris-setosa1.55.120
790.43.4Iris-setosa1.55.432
801.12.5Iris-versicolor3.95.670
811.32.9Iris-versicolor3.65.665
821.62.7Iris-versicolor5.16.084
831.63.4Iris-versicolor4.56.086
841.92.8Iris-virginica6.17.4131
852.12.8Iris-virginica5.66.4129
862.13.1Iris-virginica5.46.9140
872.23.8Iris-virginica6.77.7118
882.33.0Iris-virginica5.26.7146
892.33.4Iris-virginica5.46.2149
902.42.8Iris-virginica5.15.8115
912.53.6Iris-virginica6.17.2110
920.23.1Iris-setosa1.64.831
930.23.3Iris-setosa1.45.050
940.24.0Iris-setosa1.25.815
950.53.3Iris-setosa1.75.124
961.12.5Iris-versicolor3.05.199
971.32.8Iris-versicolor4.15.7100
981.32.9Iris-versicolor4.25.797
991.42.9Iris-versicolor4.76.164
1001.53.1Iris-versicolor4.76.787
Rows: 1-100 | Columns: 6

Other file types#

For more information about other file-specific ingestion functions, see the following reference pages, which include examples: - pandas DataFrames: `read_pandas() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.read_pandas.html>`__ and `pandas_to_vertica() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.pandas_to_vertica.html>`__ - Avro: `read_avro() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.read_avro.html>`__ - SHP: `read_shp() <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.read_shp.html>`__