verticapy.sql.geo.create_index#
- verticapy.sql.geo.create_index(vdf: str | vDataFrame, gid: str, g: str, index: str, overwrite: bool = False, max_mem_mb: int = 256, skip_nonindexable_polygons: bool = False) TableSample #
Creates a spatial index on a set of polygons to speed up spatial intersection with a set of points.
Parameters#
- vdf: SQLRelation
vDataFrame used to compute the spatial join.
- gid: str
Name of an integer column that uniquely identifies the polygon. The gid cannot be
NULL
.- g: str
Name of a geometry or geography (WGS84) column or expression that contains polygons and multipolygons. Only polygon and multipolygon can be indexed. Other shape types are excluded from the index.
- index: str
Name of the index.
- overwrite: bool, optional
BOOLEAN value that specifies whether to overwrite the index, if an index exists.
- max_mem_mb: int, optional
A positive integer that assigns a limit to the amount of memory in megabytes that
create_index
can allocate during index construction.- skip_nonindexable_polygons: bool, optional
In rare cases, intricate polygons (for instance, those with too high resolution or anomalous spikes) cannot be indexed. These polygons are considered non-indexable. When set to False, non-indexable polygons cause the index creation to fail. When set to
True
, index creation can succeed by excluding non-indexable polygons from the index.
Returns#
- TableSample
geospatial indexes.
Examples#
For this example, we will use the Cities and World dataset.
import verticapy.datasets as vpd cities = vpd.load_cities() world = vpd.load_world()
AbccityVarchar(82)AbcLong varchar(2411724)1 Abidjan 2 Abu Dhabi 3 Abuja 4 Accra 5 Addis Ababa 6 Algiers 7 Amman 8 Amsterdam 9 Andorra 10 Ankara 11 Antananarivo 12 Apia 13 Ashgabat 14 Asmara 15 Astana 16 Asuncion 17 Athens 18 Baghdad 19 Baku 20 Bamako 21 Bandar Seri Begawan 22 Bangkok 23 Bangui 24 Banjul 25 Basseterre 26 Beijing 27 Beirut 28 Belgrade 29 Belmopan 30 Berlin 31 Bern 32 Bishkek 33 Bissau 34 Bloemfontein 35 Bogota 36 Brasilia 37 Bratislava 38 Brazzaville 39 Bridgetown 40 Brussels 41 Bucharest 42 Budapest 43 Buenos Aires 44 Bujumbura 45 Cairo 46 Canberra 47 Cape Town 48 Caracas 49 Castries 50 Chisinau 51 Colombo 52 Conakry 53 Cotonou 54 Dakar 55 Damascus 56 Dar es Salaam 57 Dhaka 58 Dili 59 Djibouti 60 Doha 61 Dublin 62 Dushanbe 63 Freetown 64 Funafuti 65 Gaborone 66 Georgetown 67 Guatemala 68 Hanoi 69 Harare 70 Hargeysa 71 Havana 72 Helsinki 73 Honiara 74 Islamabad 75 Jakarta 76 Jerusalem 77 Johannesburg 78 Juba 79 Kabul 80 Kampala 81 Kathmandu 82 Khartoum 83 Kiev 84 Kigali 85 Kingston 86 Kingstown 87 Kinshasa 88 Kuala Lumpur 89 Kuwait 90 København 91 La Paz 92 Libreville 93 Lilongwe 94 Lima 95 Lisbon 96 Ljubljana 97 Lome 98 London 99 Luanda 100 Lusaka Rows: 1-100 | Columns: 2123pop_estIntegerAbccontinentVarchar(32)AbccountryVarchar(82)AbcLong varchar(2411724)1 140 Seven seas (open ocean) Fr. S. Antarctic Lands 2 2931 South America Falkland Is. 3 4050 Antarctica Antarctica 4 57713 North America Greenland 5 265100 Asia N. Cyprus 6 279070 Oceania New Caledonia 7 282814 Oceania Vanuatu 8 329988 North America Bahamas 9 339747 Europe Iceland 10 360346 North America Belize 11 443593 Asia Brunei 12 591919 South America Suriname 13 594130 Europe Luxembourg 14 603253 Africa W. Sahara 15 642550 Europe Montenegro 16 647581 Oceania Solomon Is. 17 737718 South America Guyana 18 758288 Asia Bhutan 19 778358 Africa Eq. Guinea 20 865267 Africa Djibouti 21 920938 Oceania Fiji 22 1218208 North America Trinidad and Tobago 23 1221549 Asia Cyprus 24 1251581 Europe Estonia 25 1291358 Asia Timor-Leste 26 1467152 Africa eSwatini 27 1772255 Africa Gabon 28 1792338 Africa Guinea-Bissau 29 1895250 Europe Kosovo 30 1944643 Europe Latvia 31 1958042 Africa Lesotho 32 1972126 Europe Slovenia 33 2051363 Africa Gambia 34 2103721 Europe Macedonia 35 2214858 Africa Botswana 36 2314307 Asia Qatar 37 2484780 Africa Namibia 38 2823859 Europe Lithuania 39 2875422 Asia Kuwait 40 2990561 North America Jamaica 41 3045191 Asia Armenia 42 3047987 Europe Albania 43 3068243 Asia Mongolia 44 3351827 North America Puerto Rico 45 3360148 South America Uruguay 46 3424386 Asia Oman 47 3474121 Europe Moldova 48 3500000 Africa Somaliland 49 3753142 North America Panama 50 3758571 Africa Mauritania 51 3856181 Europe Bosnia and Herz. 52 4292095 Europe Croatia 53 4510327 Oceania New Zealand 54 4543126 Asia Palestine 55 4689021 Africa Liberia 56 4926330 Asia Georgia 57 4930258 North America Costa Rica 58 4954674 Africa Congo 59 5011102 Europe Ireland 60 5320045 Europe Norway 61 5351277 Asia Turkmenistan 62 5445829 Europe Slovakia 63 5491218 Europe Finland 64 5605948 Europe Denmark 65 5625118 Africa Central African Rep. 66 5789122 Asia Kyrgyzstan 67 5918919 Africa Eritrea 68 6025951 North America Nicaragua 69 6072475 Asia United Arab Emirates 70 6163195 Africa Sierra Leone 71 6172011 North America El Salvador 72 6229794 Asia Lebanon 73 6653210 Africa Libya 74 6909701 Oceania Papua New Guinea 75 6943739 South America Paraguay 76 7101510 Europe Bulgaria 77 7111024 Europe Serbia 78 7126706 Asia Laos 79 7531386 Africa Somalia 80 7965055 Africa Togo 81 8236303 Europe Switzerland 82 8299706 Asia Israel 83 8468555 Asia Tajikistan 84 8754413 Europe Austria 85 9038741 North America Honduras 86 9549747 Europe Belarus 87 9850845 Europe Hungary 88 9960487 Europe Sweden 89 9961396 Asia Azerbaijan 90 10248069 Asia Jordan 91 10646714 North America Haiti 92 10674723 Europe Czechia 93 10734247 North America Dominican Rep. 94 10768477 Europe Greece 95 10839514 Europe Portugal 96 11038805 Africa Benin 97 11138234 South America Bolivia 98 11147407 North America Cuba 99 11403800 Africa Tunisia 100 11466756 Africa Burundi Rows: 1-100 | Columns: 4Note
VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.
Let’s preprocess the datasets by extracting latitude and longitude values and creating an index.
world["id"] = "ROW_NUMBER() OVER(ORDER BY country, pop_est)" display(world) cities["id"] = "ROW_NUMBER() OVER (ORDER BY city)" cities["lat"] = "ST_X(geometry)" cities["lon"] = "ST_Y(geometry)" display(cities)
123pop_estInt100%... 🌎Geometry(1048576)100%123idInteger100%1 34124811 ... 1 2 3047987 ... 2 3 40969443 ... 3 4 29310273 ... 4 5 4050 ... 5 6 44293293 ... 6 7 3045191 ... 7 8 23232413 ... 8 9 8754413 ... 9 10 9961396 ... 10 11 329988 ... 11 12 157826578 ... 12 13 9549747 ... 13 14 11491346 ... 14 15 360346 ... 15 16 11038805 ... 16 17 758288 ... 17 18 11138234 ... 18 19 3856181 ... 19 20 2214858 ... 20 AbccityVarchar(82)100%... 🌎latFloat(22)100%🌎lonFloat(22)100%1 Abidjan ... -4.04199411850709 5.32194282609856 2 Abu Dhabi ... 54.366593382592 24.4666835723799 3 Abuja ... 7.53138214293233 9.0852790077542 4 Accra ... -0.218661598960694 5.55198046444593 5 Addis Ababa ... 38.6980585753487 9.03525622129575 6 Algiers ... 3.04860667090924 36.7650106566281 7 Amman ... 35.9313540668741 31.9519711058275 8 Amsterdam ... 4.91469431740097 52.3519145466644 9 Andorra ... 1.51648596050552 42.5000014435459 10 Ankara ... 32.8624457823566 39.9291844440755 11 Antananarivo ... 47.5146780415299 -18.9146914920322 12 Apia ... -171.738641608603 -13.8415450424484 13 Ashgabat ... 58.3832991117746 37.949994933111 14 Asmara ... 38.9333235257593 15.3333392526819 15 Astana ... 71.427774209483 51.1811253042576 16 Asuncion ... -57.6434510279013 -25.2944571170577 17 Athens ... 23.7313752256794 37.9852720905523 18 Baghdad ... 44.3919229145641 33.3405943561586 19 Baku ... 49.8602713032578 40.397217891343 20 Bamako ... -8.0019849632497 12.6519605263233 Let’s create the geo-index.
from verticapy.sql.geo import create_index create_index(world, "id", "geometry", "world_polygons", True)
AbctypeVarchar(20)... 123max_yFloat(22)AbcinfoVarchar(500)1 GEOMETRY ... 83.64513 Let’s calculate the intersection between the cities and the various countries by using the GEOMETRY data type.
from verticapy.sql.geo import intersect intersect(cities, "world_polygons", "id", "geometry")
123point_idInteger100%123polygon_gidInteger100%1 50 102 2 51 147 3 52 65 4 53 16 5 54 136 6 55 152 7 56 155 8 129 100 9 131 118 10 132 28 11 133 24 12 135 122 13 136 149 14 1 40 15 2 165 16 3 116 17 4 61 18 5 51 19 6 3 20 7 81 The same can be done using directly the longitude and latitude.
intersect(cities, "world_polygons", "id", x="lat", y="lon")
123point_idInteger100%123polygon_gidInteger100%1 41 131 2 42 70 3 43 6 4 44 25 5 45 46 6 46 8 7 47 144 8 48 171 9 122 83 10 124 107 11 125 30 12 126 72 13 127 115 14 128 108 15 201 7 16 202 36 17 1 40 18 2 165 19 3 116 20 4 61 Note
For geospatial functions, Vertica utilizes indexing to expedite computations, especially considering the potentially extensive size of polygons. This is a unique optimization approach employed by Vertica in these scenarios.
See also
describe_index()
: Describes the geo index.intersect()
: Spatially intersects a point or points with a set of polygons.rename_index()
: Renames the geo index.