read_file

In [ ]:
read_file(path: str,
          schema: str = "",
          table_name: str = "",
          dtype: dict = {},
          unknown: str = "varchar",
          varchar_varbinary_length: int = 80,
          insert: bool = False,
          temporary_table: bool = False,
          temporary_local_table: bool = True,
          gen_tmp_table_name: bool = True,
          ingest_local: bool = False,
          genSQL: bool = False,
          max_files: int = 100,)

Inspects and ingests a file in CSV, Parquet, ORC, JSON, or Avro format. This function uses the Vertica complex data type. For new table creation, the file must be located in the server.

Parameters

Name Type Optional Description
path
str
Path to a file or glob. Valid paths include any path that is valid for COPY and that uses a file format supported by this function. When inferring the data type, only one file will be read, even if a glob specifies multiple files. However, in the case of JSON, more than one file may be read to infer the data type.
schema
str
Schema in which to create the table.
table_name
str
Name of the table to create. If empty, the file name is used.
dtype
dict
Dictionary of customised data type. The predicted data types will be replaced by the input data types. The dictionary must include the name of the column as key and the new data type as value.
unknown
str
Type used to replace unknown data types.
varchar_varbinary_length
int
Default length of varchar and varbinary columns.
insert
bool
If set to True, the data is ingested into the input relation. When you set this parameter to True, most of the parameters are ignored.
temporary_table
bool
If set to True, a temporary table will be created.
temporary_local_table
bool
If set to True, a temporary local table will be created. The parameter 'schema' must to be empty, otherwise this parameter is ignored.
gen_tmp_table_name
bool
Sets the name of the temporary table. This parameter is only used when the parameter 'temporary_local_table' is set to True and the parameters "table_name" and "schema" are unspecified.
ingest_local
bool
If set to True, the file is ingested from the local machine. This currently only works for data insertion.
genSQL
bool
If set to True, the SQL code for creating the final table is generated but not executed. This is a good way to change the final relation types or to customize the data ingestion.
max_files
int
(JSON only.) If path is a glob, specifies maximum number of files in path to inspect. Use this parameter to increase the amount of data the function considers. This can be beneficial if you suspect variation among files. Files are chosen arbitrarily from the glob.

Returns

vDataFrame : The vDataFrame of the relation.

Example

In [3]:
from verticapy.utilities import read_file
# Generates the SQL needed to create the Table
read_file("laliga/*.json", 
          table_name = "laliga",
          schema = "public",
          genSQL = True)
Out[3]:
['CREATE TABLE "public"."laliga"("away_score" int, "away_team" Row("away_team_gender" varchar, "away_team_group" varchar, "away_team_id" int, "away_team_name" varchar, "country" Row("id" int, "name" varchar), "managers" Array[Row("country" Row("id" int, "name" varchar), "dob" date, "id" int, "name" varchar, "nickname" varchar)]), "competition" Row("competition_id" int, "competition_name" varchar, "country_name" varchar), "competition_stage" Row("id" int, "name" varchar), "home_score" int, "home_team" Row("country" Row("id" int, "name" varchar), "home_team_gender" varchar, "home_team_group" varchar, "home_team_id" int, "home_team_name" varchar, "managers" Array[Row("country" Row("id" int, "name" varchar), "dob" date, "id" int, "name" varchar, "nickname" varchar)]), "kick_off" time, "last_updated" date, "match_date" date, "match_id" int, "match_status" varchar, "match_week" int, "metadata" Row("data_version" date, "shot_fidelity_version" int, "xy_fidelity_version" int), "season" Row("season_id" int, "season_name" varchar))',
 'copy "public"."laliga" from \'/scratch_b/qa/ericsson/laliga/*.json\' PARSER FJsonParser()']
In [5]:
# Generates the SQL needed to create the Table + Replacing some elements
read_file("laliga/*.json", 
          table_name = "laliga",
          schema = "public",
          genSQL = True,
          dtype = {"competition_name": "varchar(30)",
                   "home_team_id": "float"},
          varchar_varbinary_length = 50)
Out[5]:
['CREATE TABLE "public"."laliga"("away_score" int, "away_team" Row("away_team_gender" varchar(50), "away_team_group" varchar(50), "away_team_id" int, "away_team_name" varchar(50), "country" Row("id" int, "name" varchar(50)), "managers" Array[Row("country" Row("id" int, "name" varchar(50)), "dob" date, "id" int, "name" varchar(50), "nickname" varchar(50))]), "competition" Row("competition_id" int, "competition_name" varchar(30), "country_name" varchar(50)), "competition_stage" Row("id" int, "name" varchar(50)), "home_score" int, "home_team" Row("country" Row("id" int, "name" varchar(50)), "home_team_gender" varchar(50), "home_team_group" varchar(50), "home_team_id" float, "home_team_name" varchar(50), "managers" Array[Row("country" Row("id" int, "name" varchar(50)), "dob" date, "id" int, "name" varchar(50), "nickname" varchar(50))]), "kick_off" time, "last_updated" date, "match_date" date, "match_id" int, "match_status" varchar(50), "match_week" int, "metadata" Row("data_version" date, "shot_fidelity_version" int, "xy_fidelity_version" int), "season" Row("season_id" int, "season_name" varchar(50)))',
 'copy "public"."laliga" from \'/scratch_b/qa/ericsson/laliga/*.json\' PARSER FJsonParser()']
In [8]:
# Ingesting the file
read_file("laliga/*.json", 
          table_name = "laliga",
          schema = "public",
          dtype = {"competition_name": "varchar(30)",
                   "home_team_id": "float"},
          varchar_varbinary_length = 50)
Out[8]:
123
away_score
Int
🛠
Row(away_team_gender varchar(50),away_team_group varchar(50),away_team_id int,away_team_name varchar(50),country row(id int,name
🛠
Row(competition_id int,competition_name varchar(30),country_name varchar(50))
🛠
Row(id int,name varchar(50))
123
home_score
Int
🛠
Row(country row(id int,name varchar(50)),home_team_gender varchar(50),home_team_group varchar(50),home_team_id float,home_team_n
📅
kick_off
Time
📅
last_updated
Date
📅
match_date
Date
123
match_id
Int
Abc
match_status
Varchar(50)
123
match_week
Int
🛠
Row(data_version date,shot_fidelity_version int,xy_fidelity_version int)
🛠
Row(season_id int,season_name varchar(50))
10019:00:002019-12-162014-05-11266201available37
20020:00:002019-12-162013-10-19266462available9
30022:00:002019-12-162011-10-2269329available9
40022:00:002019-12-162012-01-2869338available21
50119:00:002019-12-162008-08-3169147available1
60120:30:002019-12-162015-08-29267422available2
70120:45:002019-12-162016-10-29267058available10
80120:45:002019-12-162019-04-2716289available35
90121:00:002019-12-162013-11-0170283available12
100121:30:002019-12-162012-09-0270286available3
110216:15:002020-04-212019-03-3016231available29
120220:00:002019-12-162009-03-0769216available26
130220:00:002019-12-162009-04-1169222available30
140220:00:002019-12-162011-04-2369271available33
150220:45:002019-12-162019-04-0616248available31
160319:00:002019-12-162010-03-1469224available26
170320:00:002019-12-162007-11-2469184available13
180320:00:002019-12-162014-10-18265896available8
190322:00:002019-12-162005-10-2268318available8
200322:00:002019-12-162011-02-0569276available22
210421:00:002019-12-162011-11-2969332available17
220421:00:002019-12-162012-04-1069343available33
230516:15:002019-12-162017-01-14267395available18
240520:00:002019-12-162009-01-1769183available19
250520:45:002019-12-162017-03-04266299available26
260521:00:002019-12-162010-12-1269302available15
270522:00:002019-12-162008-10-2569232available8
280616:00:002019-12-162016-03-12266498available29
290619:00:002019-12-162008-05-0469141available35
300619:00:002019-12-162009-03-2269142available28
310717:00:002019-12-162014-03-16266074available28
321019:00:002019-12-162014-08-31266406available2
331020:00:002019-12-162009-04-0469273available29
341020:00:002019-12-162011-10-2569318available10
351020:00:002019-12-162014-11-01267301available10
361020:15:002019-12-162016-08-28266892available2
371020:45:002019-12-162019-03-0216196available26
381021:00:002019-12-162014-11-30266560available13
391021:00:002019-12-162015-04-05266929available29
401116:15:002019-12-162016-12-03267076available14
411119:00:002019-12-162007-05-1369162available34
421120:30:002019-12-162015-12-05267274available14
431120:45:002019-12-162018-11-2416073available13
441122:00:002020-04-092011-04-1669245available32
451216:00:002019-12-162015-09-26267611available6
461218:00:002019-12-162013-10-2670293available10
471219:00:002019-12-162006-01-1568342available19
481220:00:002019-12-162010-02-0669240available21
490016:00:002019-12-162014-12-1370264available15
500016:15:002019-12-162018-02-119794available23
510019:00:002019-12-162007-08-2669180available1
520020:45:002019-12-162019-02-1016173available23
530022:00:002019-12-162014-09-24266838available5
540116:00:002019-12-162014-03-08266603available27
550119:00:002019-12-162008-02-0369187available22
560120:00:002019-12-162011-03-0569300available27
570120:30:002019-12-162016-04-09266664available32
580122:00:002019-12-162011-11-2669315available14
590216:00:002019-12-162014-09-1370273available3
600216:00:002019-12-162015-04-18266525available32
610219:00:002019-12-162006-01-2268324available20
620219:00:002019-12-162007-02-1169154available22
630219:00:002019-12-162011-05-0869234available35
640220:00:002019-12-162007-11-1069178available12
650220:00:002019-12-162010-03-2469257available28
660220:00:002019-12-162013-03-0970305available27
670220:30:002019-12-162005-10-2669164available11
680221:00:002019-12-162005-04-1768352available32
690222:00:002019-12-162012-09-2270270available5
700318:30:002019-12-162019-01-1316131available19
710319:00:002019-12-162007-11-0469166available11
720320:00:002019-12-162006-09-0968327available2
730320:00:002019-12-162014-03-26266274available30
740321:00:002019-12-162004-10-2468313available8
750322:00:002019-12-162010-04-1469221available32
760322:15:002019-12-162018-08-1815946available1
770420:00:002019-12-162010-02-2069233available23
780420:45:002019-12-162017-12-179726available16
790422:00:002019-12-162010-01-1669243available18
800520:45:002019-12-162017-09-099592available3
810521:00:002019-12-162010-11-2969299available13
820620:00:002019-12-162015-04-28265963available34
830620:30:002019-12-162016-04-23265944available35
840820:00:002019-12-162011-09-1770220available4
851016:00:002019-12-162014-03-2970306available31
861020:00:002019-12-162010-01-3069220available20
871020:00:002019-12-162011-10-0269265available7
881022:00:002019-12-162004-10-1668353available7
891116:15:002019-12-162018-02-049783available22
901116:15:002019-12-162018-09-2916010available7
911118:00:002019-12-162014-05-17266256available38
921119:00:002019-12-162014-01-1970288available20
931120:00:002019-12-162009-10-3169277available9
941120:45:002019-12-162017-10-149642available8
951120:45:002019-12-162017-11-269695available13
961121:00:002020-02-252018-03-019827available26
971121:30:002019-12-162012-01-0869312available18
981122:00:002019-12-162006-11-0469174available9
991122:00:002019-12-162007-11-0168363available10
1001122:00:002019-12-162007-12-0168356available14
Rows: 1-100 | Columns: 14