# Pokemon¶

This example uses the 'pokemon' and 'combats' datasets to predict the winner of a 1-on-1 Pokemon battle. You can download the Jupyter Notebook of the study here and two datasets:

pokemon

• Name: The name of the Pokemon
• Generation: Pokemon's generation
• Legendary: True if the Pokemon is legendary
• HP: Number of hit points
• Attack: Attack stat
• Sp_Atk: Special attack stat
• Defense: Defense stat
• Sp_Def: Special defense stat
• Speed:
• Speed stat
• Type_1: Pokemon's first type
• Type_2: Pokemon's second type

combats

• First_pokemon: Pokemon of trainer 1
• Second_pokemon: Pokemon of trainer 2
• Winner: Winner of the battle

We will follow the data science cycle (Data Exploration - Data Preparation - Data Modeling - Model Evaluation - Model Deployment) to solve this problem.

## Initialization¶

This example uses the following version of VerticaPy:

In [15]:
```import verticapy as vp
vp.__version__
```
Out[15]:
`'0.9.0'`

Connect to Vertica. This example uses an existing connection called "VerticaDSN." For details on how to create a connection, use see the connection tutorial.

In [1]:
```vp.connect("VerticaDSN")
```

Let's ingest the datasets.

In [2]:
```import verticapy.stats as st
vp.drop('combats')
```
Out[2]:
 123First_pokemonInt 123Second_pokemonInt 123WinnerInt 1 1 6 6 2 1 26 26 3 1 37 37 4 1 43 43 5 1 54 54
Rows: 1-5 | Columns: 3
In [4]:
```vp.drop('pokemon')
```
Out[4]:
 123IDInt AbcNameVarchar(50) AbcType_1Varchar(20) AbcType_2Varchar(20) 123HPInt 123AttackInt 123DefenseInt 123Sp_AtkInt 123Sp_DefInt 123SpeedInt 123GenerationInt 010LegendaryBoolean 1 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 ❌ 2 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 ❌ 3 3 Venusaur Grass Poison 80 82 83 100 100 80 1 ❌ 4 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 1 ❌ 5 5 Charmander Fire [null] 39 52 43 60 50 65 1 ❌
Rows: 1-5 | Columns: 12

## Data Exploration and Preparation¶

The table 'combats' will be joined to the table 'pokemon' to predict the winner.

The 'pokemon' table contains the information on each Pokemon. Let's describe this table.

In [5]:
```pokemon.describe(method = "categorical", unique = True)
```
Out[5]:
 dtype count top top_percent unique "ID" int 800 1 0.125 800.0 "Name" varchar(50) 799 Deino 0.125 799.0 "Type_1" varchar(20) 800 Water 14.0 18.0 "Type_2" varchar(20) 414 [null] 48.25 18.0 "HP" int 800 60 8.375 94.0 "Attack" int 800 100 5.0 111.0 "Defense" int 800 70 6.75 103.0 "Sp_Atk" int 800 60 6.375 105.0 "Sp_Def" int 800 80 6.5 92.0 "Speed" int 800 50 5.75 108.0 "Generation" int 800 1 20.75 6.0 "Legendary" boolean 800 ❌ 91.875 2.0
Rows: 1-12 | Columns: 6

The pokemon's 'Name', 'Generation', and whether or not it's 'Legendary' will never influence the outcome of the battle, so we can drop these columns.

In [6]:
```pokemon.drop(["Generation",
"Legendary",
"Name"])
```
Out[6]:
 123IDInt AbcType_1Varchar(20) AbcType_2Varchar(20) 123HPInt 123AttackInt 123DefenseInt 123Sp_AtkInt 123Sp_DefInt 123SpeedInt 1 1 Grass Poison 45 49 49 65 65 45 2 2 Grass Poison 60 62 63 80 80 60 3 3 Grass Poison 80 82 83 100 100 80 4 4 Grass Poison 80 100 123 122 120 80 5 5 Fire [null] 39 52 43 60 50 65 6 6 Fire [null] 58 64 58 80 65 80 7 7 Fire Flying 78 84 78 109 85 100 8 8 Fire Dragon 78 130 111 130 85 100 9 9 Fire Flying 78 104 78 159 115 100 10 10 Water [null] 44 48 65 50 64 43 11 11 Water [null] 59 63 80 65 80 58 12 12 Water [null] 79 83 100 85 105 78 13 13 Water [null] 79 103 120 135 115 78 14 14 Bug [null] 45 30 35 20 20 45 15 15 Bug [null] 50 20 55 25 25 30 16 16 Bug Flying 60 45 50 90 80 70 17 17 Bug Poison 40 35 30 20 20 50 18 18 Bug Poison 45 25 50 25 25 35 19 19 Bug Poison 65 90 40 45 80 75 20 20 Bug Poison 65 150 40 15 80 145 21 21 Normal Flying 40 45 40 35 35 56 22 22 Normal Flying 63 60 55 50 50 71 23 23 Normal Flying 83 80 75 70 70 101 24 24 Normal Flying 83 80 80 135 80 121 25 25 Normal [null] 30 56 35 25 35 72 26 26 Normal [null] 55 81 60 50 70 97 27 27 Normal Flying 40 60 30 31 31 70 28 28 Normal Flying 65 90 65 61 61 100 29 29 Poison [null] 35 60 44 40 54 55 30 30 Poison [null] 60 85 69 65 79 80 31 31 Electric [null] 35 55 40 50 50 90 32 32 Electric [null] 60 90 55 90 80 110 33 33 Ground [null] 50 75 85 20 30 40 34 34 Ground [null] 75 100 110 45 55 65 35 35 Poison [null] 55 47 52 40 40 41 36 36 Poison [null] 70 62 67 55 55 56 37 37 Poison Ground 90 92 87 75 85 76 38 38 Poison [null] 46 57 40 40 40 50 39 39 Poison [null] 61 72 57 55 55 65 40 40 Poison Ground 81 102 77 85 75 85 41 41 Fairy [null] 70 45 48 60 65 35 42 42 Fairy [null] 95 70 73 95 90 60 43 43 Fire [null] 38 41 40 50 65 65 44 44 Fire [null] 73 76 75 81 100 100 45 45 Normal Fairy 115 45 20 45 25 20 46 46 Normal Fairy 140 70 45 85 50 45 47 47 Poison Flying 40 45 35 30 40 55 48 48 Poison Flying 75 80 70 65 75 90 49 49 Grass Poison 45 50 55 75 65 30 50 50 Grass Poison 60 65 70 85 75 40 51 51 Grass Poison 75 80 85 110 90 50 52 52 Bug Grass 35 70 55 45 55 25 53 53 Bug Grass 60 95 80 60 80 30 54 54 Bug Poison 60 55 50 40 55 45 55 55 Bug Poison 70 65 60 90 75 90 56 56 Ground [null] 10 55 25 35 45 95 57 57 Ground [null] 35 80 50 50 70 120 58 58 Normal [null] 40 45 35 40 40 90 59 59 Normal [null] 65 70 60 65 65 115 60 60 Water [null] 50 52 48 65 50 55 61 61 Water [null] 80 82 78 95 80 85 62 62 Fighting [null] 40 80 35 35 45 70 63 63 Fighting [null] 65 105 60 60 70 95 64 64 Fire [null] 55 70 45 70 50 60 65 65 Fire [null] 90 110 80 100 80 95 66 66 Water [null] 40 50 40 40 40 90 67 67 Water [null] 65 65 65 50 50 90 68 68 Water Fighting 90 95 95 70 90 70 69 69 Psychic [null] 25 20 15 105 55 90 70 70 Psychic [null] 40 35 30 120 70 105 71 71 Psychic [null] 55 50 45 135 95 120 72 72 Psychic [null] 55 50 65 175 95 150 73 73 Fighting [null] 70 80 50 35 35 35 74 74 Fighting [null] 80 100 70 50 60 45 75 75 Fighting [null] 90 130 80 65 85 55 76 76 Grass Poison 50 75 35 70 30 40 77 77 Grass Poison 65 90 50 85 45 55 78 78 Grass Poison 80 105 65 100 70 70 79 79 Water Poison 40 40 35 50 100 70 80 80 Water Poison 80 70 65 80 120 100 81 81 Rock Ground 40 80 100 30 30 20 82 82 Rock Ground 55 95 115 45 45 35 83 83 Rock Ground 80 120 130 55 65 45 84 84 Fire [null] 50 85 55 65 65 90 85 85 Fire [null] 65 100 70 80 80 105 86 86 Water Psychic 90 65 65 40 40 15 87 87 Water Psychic 95 75 110 100 80 30 88 88 Water Psychic 95 75 180 130 80 30 89 89 Electric Steel 25 35 70 95 55 45 90 90 Electric Steel 50 60 95 120 70 70 91 91 Normal Flying 52 65 55 58 62 60 92 92 Normal Flying 35 85 45 35 35 75 93 93 Normal Flying 60 110 70 60 60 100 94 94 Water [null] 65 45 55 45 70 45 95 95 Water Ice 90 70 80 70 95 70 96 96 Poison [null] 80 80 50 40 50 25 97 97 Poison [null] 105 105 75 65 100 50 98 98 Water [null] 30 65 100 45 25 40 99 99 Water Ice 50 95 180 85 45 70 100 100 Ghost Poison 30 35 30 100 35 80
Rows: 1-100 of 800 | Columns: 9

The 'ID' will be the key to join the data. By joining the data, we will be able to create more relevant features.

In [7]:
```fights = pokemon.join(combats,
on = {"ID": "First_Pokemon"},
how = "inner",
expr1 = ["Sp_Atk AS Sp_Atk_1",
"Speed AS Speed_1",
"Sp_Def AS Sp_Def_1",
"Defense AS Defense_1",
"Type_1 AS Type_1_1",
"Type_2 AS Type_2_1",
"HP AS HP_1",
"Attack AS Attack_1"],
expr2 = ["First_Pokemon",
"Second_Pokemon",
"Winner"]).join(pokemon,
on = {"Second_Pokemon": "ID"},
how = "inner",
expr2 = ["Sp_Atk AS Sp_Atk_2",
"Speed AS Speed_2",
"Sp_Def AS Sp_Def_2",
"Defense AS Defense_2",
"Type_1 AS Type_1_2",
"Type_2 AS Type_2_2",
"HP AS HP_2",
"Attack AS Attack_2"],
expr1 = ["Sp_Atk_1",
"Speed_1",
"Sp_Def_1",
"Defense_1",
"Type_1_1",
"Type_2_1",
"HP_1",
"Attack_1",
"Winner",
"Second_pokemon"])
```

Features engineering is the key. Here, we can create features that describe the stat differences between the first and second Pokemon. We can also change 'winner' to a binary value: 1 if the first pokemon won and 0 otherwise.

In [8]:
```fights["Sp_Atk_diff"] = fights["Sp_Atk_1"] - fights["Sp_Atk_2"]
fights["Speed_diff"] = fights["Speed_1"] - fights["Speed_2"]
fights["Sp_Def_diff"] = fights["Sp_Def_1"] - fights["Sp_Def_2"]
fights["Defense_diff"] = fights["Defense_1"] - fights["Defense_2"]
fights["HP_diff"] = fights["HP_1"] - fights["HP_2"]
fights["Attack_diff"] = fights["Attack_1"] - fights["Attack_2"]
fights["Winner"] = st.case_when(fights["Winner"] == fights["Second_pokemon"], 0, 1)
fights = fights[["Sp_Atk_diff", "Speed_diff", "Sp_Def_diff",
"Defense_diff", "HP_diff", "Attack_diff",
"Type_1_1", "Type_1_2", "Type_2_1", "Type_2_2",
"Winner"]]
display(fights)
```
 123Sp_Atk_diffInteger 123Speed_diffInteger 123Sp_Def_diffInteger 123Defense_diffInteger 123HP_diffInteger 123Attack_diffInteger AbcType_1_1Varchar(20) AbcType_1_2Varchar(20) AbcType_2_1Varchar(20) AbcType_2_2Varchar(20) 123WinnerInteger 1 -15 -35 0 -9 -13 -15 Grass Fire Poison [null] 0 2 15 -52 -5 -11 -10 -32 Grass Normal Poison [null] 0 3 -10 -31 -20 -38 -45 -43 Grass Poison Poison Ground 0 4 15 -20 0 9 7 8 Grass Fire Poison [null] 0 5 25 0 10 -1 -15 -6 Grass Bug Poison Poison 0 6 0 -10 -20 -31 -45 -81 Grass Fighting Poison [null] 0 7 0 -10 -20 -31 -45 -81 Grass Fighting Poison [null] 0 8 0 -45 0 -6 -5 -36 Grass Fire Poison [null] 0 9 -60 -10 0 -36 -50 -46 Grass Grass Poison Psychic 0 10 30 -42 -45 -4 -5 -71 Grass Fighting Poison [null] 0 11 -35 -70 -20 -36 -15 -26 Grass Water Poison Psychic 0 12 0 -35 -5 -56 -15 -66 Grass Rock Poison Water 0 13 -5 -25 -5 -16 -16 -35 Grass Dragon Poison [null] 0 14 -35 -35 -35 -46 -46 -85 Grass Dragon Poison Flying 0 15 29 -5 9 19 -15 19 Grass Normal Poison Flying 0 16 9 -22 9 11 -30 11 Grass Water Poison Electric 0 17 25 25 0 -16 10 29 Grass Fairy Poison [null] 1 18 -15 5 -40 -36 -10 9 Grass Fairy Poison Flying 1 19 0 10 20 9 -10 9 Grass Electric Poison [null] 1 20 -35 15 -45 -31 -50 -26 Grass Water Poison Psychic 1 21 -7 -3 17 1 -3 -23 Grass Psychic Poison [null] 0 22 -25 -40 0 -16 -25 -31 Grass Normal Poison Psychic 0 23 5 0 5 -26 -45 -71 Grass Fairy Poison [null] 0 24 5 -5 5 -31 -55 -51 Grass Ice Poison Ground 0 25 -75 -70 -25 -41 -30 -41 Grass Dark Poison Fire 0 26 5 -5 5 -71 -45 -71 Grass Ground Poison [null] 0 27 35 10 35 14 10 -6 Grass Dark Poison [null] 1 28 10 -40 35 19 5 19 Grass Water Poison Flying 1 29 -100 -55 -70 -16 -23 -36 Grass Psychic Poison Fairy 0 30 5 -25 5 -31 -15 -81 Grass Grass Poison Fighting 0 31 20 15 -25 -86 15 4 Grass Rock Poison [null] 1 32 -15 -55 -20 -36 -15 -51 Grass Fighting Poison Psychic 0 33 0 -20 25 9 5 4 Grass Electric Poison [null] 0 34 22 5 12 -4 -25 6 Grass Poison Poison [null] 1 35 -5 -15 30 14 -85 -21 Grass Water Poison [null] 0 36 -50 -10 5 -11 -25 -66 Grass Grass Poison Dark 0 37 19 -15 24 6 -5 1 Grass Water Poison Ground 0 38 -5 0 -15 -51 -30 -76 Grass Rock Poison Bug 0 39 5 5 -55 -21 -15 -41 Grass Normal Poison [null] 1 40 -7 -6 -22 -34 -54 -19 Grass Grass Poison Flying 0 41 15 -5 -35 -151 -35 -51 Grass Rock Poison [null] 0 42 -29 -21 15 -1 -25 -45 Grass Bug Poison Flying 0 43 24 -29 24 2 -18 -14 Grass Poison Poison Dark 0 44 -5 -15 -25 4 25 24 Grass Psychic Poison Fairy 0 45 15 -37 10 -16 -23 -41 Grass Dragon Poison Ground 0 46 5 -50 -10 -61 -25 -41 Grass Poison Poison Dark 0 47 -25 -1 -7 -23 -29 -51 Grass Grass Poison [null] 0 48 20 -80 -20 -16 -25 -71 Grass Dark Poison Ice 0 49 -65 -20 -30 -61 -20 -11 Grass Ice Poison [null] 0 50 -40 -41 -42 -58 -5 -16 Grass Electric Poison Fire 0 51 -40 -41 -42 -58 -5 -16 Grass Electric Poison Fire 0 52 -10 -68 -30 -46 -30 -26 Grass Grass Poison [null] 0 53 20 0 20 4 -20 -14 Grass Fire Poison [null] 0 54 -33 -56 2 -14 -30 -49 Grass Grass Poison [null] 0 55 29 2 35 -1 -5 -6 Grass Normal Poison Flying 1 56 35 -12 26 -10 15 4 Grass Bug Poison Poison 0 57 -15 -20 25 9 5 -16 Grass Dark Poison [null] 0 58 -30 -20 -45 -46 -25 -6 Grass Psychic Poison [null] 0 59 -22 -53 2 -14 -30 -38 Grass Water Poison Flying 0 60 -10 -58 5 -11 -10 -26 Grass Electric Poison Flying 0 61 -5 -5 -20 -46 -15 -31 Grass Steel Poison [null] 0 62 -35 -100 5 9 -35 -21 Grass Bug Poison [null] 0 63 30 10 15 -1 -14 -25 Grass Ground Poison Ghost 1 64 5 -25 -5 -51 -20 -76 Grass Dark Poison Steel 0 65 5 -25 -5 -51 -20 -76 Grass Dark Poison Steel 0 66 -80 -56 -15 -21 -34 -56 Grass Electric Poison Flying 0 67 -64 -63 -25 -41 -46 -23 Grass Water Poison Fighting 0 68 -12 -83 -12 -41 -55 -79 Grass Normal Poison Fighting 0 69 -44 -64 -29 -3 -17 -6 Grass Electric Poison Normal 0 70 15 7 5 1 2 -21 Grass Ghost Poison Grass 1 71 20 -5 30 20 21 10 Grass Fire Poison [null] 0 72 35 -5 25 -47 -15 -38 Grass Ground Poison [null] 0 73 40 10 40 23 14 5 Grass Poison Poison [null] 1 74 30 -5 15 23 22 21 Grass Fire Poison [null] 0 75 20 30 0 -17 0 -33 Grass Bug Poison Grass 1 76 40 35 30 13 -20 -18 Grass Poison Poison [null] 0 77 15 10 -20 -12 -45 -43 Grass Poison Poison [null] 1 78 -35 -35 25 18 15 12 Grass Ghost Poison Poison 0 79 -20 0 40 -52 -5 7 Grass Grass Poison [null] 0 80 10 -21 -50 -46 -35 -93 Grass Water Poison Dark 0 81 -5 0 -15 -17 -70 -23 Grass Water Poison Ice 0 82 15 -20 10 -42 0 -53 Grass Rock Poison Water 0 83 -74 -70 -20 -37 -46 -128 Grass Psychic Poison Fighting 0 84 36 17 32 -1 10 -3 Grass Water Poison [null] 1 85 40 40 15 -2 25 42 Grass Fairy Poison [null] 1 86 40 40 15 -2 25 42 Grass Fairy Poison [null] 1 87 0 20 -25 -22 5 22 Grass Fairy Poison Flying 1 88 -50 -50 -15 3 -5 -3 Grass Psychic Poison [null] 0 89 -5 -31 38 21 0 -23 Grass Dark Poison Flying 0 90 -5 -31 38 21 0 -23 Grass Dark Poison Flying 0 91 -30 -20 10 -7 -20 -58 Grass Fire Poison Fighting 0 92 60 40 50 28 15 17 Grass Bug Poison [null] 1 93 15 -5 60 43 15 -28 Grass Water Poison Dark 0 94 -35 -55 20 3 -5 -88 Grass Dark Poison [null] 0 95 30 10 30 13 10 12 Grass Ice Poison [null] 1 96 -40 -40 0 -17 -20 -58 Grass Ice Poison [null] 0 97 -34 8 5 -42 5 -22 Grass Water Poison [null] 1 98 50 0 50 33 20 7 Grass Normal Poison Flying 0 99 20 0 31 14 0 -23 Grass Electric Poison [null] 0 100 33 30 -58 -105 0 10 Grass Rock Poison Steel 0
Rows: 1-100 | Columns: 11

Missing values can not be handled by most machine learning models. Let's see which features we should impute.

In [9]:
```fights.count()
```
Out[9]:
 count "Sp_Atk_diff" 50000.0 "Speed_diff" 50000.0 "Sp_Def_diff" 50000.0 "Defense_diff" 50000.0 "HP_diff" 50000.0 "Attack_diff" 50000.0 "Type_1_1" 50000.0 "Type_1_2" 50000.0 "Type_2_1" 25969.0 "Type_2_2" 26015.0 "Winner" 50000.0
Rows: 1-11 | Columns: 2

In terms of missing values, our only concern is the Pokemon's second type (Type_2_1 and Type_2_2). Since some Pokemon only have one type, these features are MNAR (missing values not at random). We can impute the missing values by creating another category.

In [10]:
```fights["Type_2_1"].fillna("No")
fights["Type_2_2"].fillna("No")
```
```24031 elements were filled.
23985 elements were filled.
```
Out[10]: