Pokemon API Vignette

Mark Austin 10/05/2021

Introduction
Required R Packages
Pokemon API Query and Data Proccessing Functions
Exploratory Data Analysis

Introduction

This is a vignette for reading and summarizing data from the Pokemon API using Pokemon API Documentation. After describing many API query functions with accompanying R code, the vignette presents exploratory data analysis again using R code.

Required R Packages

The following R packages are required to run R code used in this Vignette and/or create this document.

tidyverse The tidyverse package is used for data handling and plotting.
jsonlite The jsonlite package is used to contact the API and return data.
knitr The knitr package is used for document image handling.
rmarkdown The rmarkdown package is used by a render program to render this document.

Pokemon API Query and Data Proccessing Functions

I created the following functions to query and process data from the Pokemon API using Pokemon API Documentation. I found that I could use fromJSON() directly with the Pokemon API and directly assign API calls to list of list objects.

For each endpoint, users can customize their query to return specific data based on names or ids relevant to that endpoint. I provide metadata functions so that users will know what names and ids are valid for a given endpoint. I also tried different approaches to giving the user flexibility in the data returned with some functions returning groups of data and other letting the user do more customization.

Pokemon Endpoint Functions

Most data relevant to individual pokemon is obtained from the Pokemon endpoint. This endpoint returns a complex list of lists with more data than most users would need. I’ve provided three functions to query and process pokemon endpoint data. The functions all return data frames.

getPokeNameIDFunction In order to query individual pokemon, the user must provide either a name or id value. This function returns a list of all possible pokemon for this endpoint so that the user will know what pokemon are available. The names can be sorted as an option.

getPokeNameID <- function(sortName=FALSE){
  
  apiData<-fromJSON("https://pokeapi.co/api/v2/pokemon/?limit=1222")
  
  allNames<-as_tibble(apiData$results)
  
  allNames<-allNames %>% mutate(ID=as.numeric(basename(url)))
  
  if (sortName) {
    allNames<-allNames %>% arrange(name)
  }
  
  return(allNames)
  
}

Example getPokeNameIDFunction usage with output.

kable(head(getPokeNameID(sortName = TRUE)))

name	url	ID
abomasnow	https://pokeapi.co/api/v2/pokemon/460/	460
abomasnow-mega	https://pokeapi.co/api/v2/pokemon/10060/	10060
abra	https://pokeapi.co/api/v2/pokemon/63/	63
absol	https://pokeapi.co/api/v2/pokemon/359/	359
absol-mega	https://pokeapi.co/api/v2/pokemon/10057/	10057
accelgor	https://pokeapi.co/api/v2/pokemon/617/	617

getOnePokeData Given a pokemon name or id, this function returns a data frame with data for that pokemon. Given how much data is available and the complexity of processing data, I give the user a few options for the amount of data returned. The default option returns top level data including species,height,weight,base_experience. Turning the basestat function option on additionally returns hp,attack,defense,special_attack,special_defense ,speed. Finally, turning the type option on additionally returns primary and secondary types type_one,type_two.

getOnePokeData<-function(pokemon,basestat=FALSE,type=FALSE){
  
  ##Get list of pokemon and process user pokemon input
  pokeNameID<-getPokeNameID()
  
  if (is.numeric(pokemon)){
    pokeNameID<-pokeNameID%>%filter(ID==pokemon)
  } else if (is.character(pokemon)){
    pokeNameID<-pokeNameID%>%filter(name==tolower(pokemon))
  } else {
    stop("Please enter either pokemon integer or quoated name value")
  }
  
  PokeList<- fromJSON(pokeNameID$url,flatten = TRUE)
  
  ###Function Default Data
  name<-PokeList$name
  height<-PokeList$height
  id<-PokeList$id
  species<-PokeList$species$name
  weight<-PokeList$weight
  base_experience<-PokeList$base_experience
  
  LocalDF<-data.frame(name,id,species,height,weight,base_experience)
  
  ##process and add base stat data if user selects basestat TRUE
  if (basestat){
    hp<-PokeList$stats$base_stat[1]
    attack<-PokeList$stats$base_stat[2]
    defense<-PokeList$stats$base_stat[3]
    special_attack<-PokeList$stats$base_stat[4]
    special_defense<-PokeList$stats$base_stat[5]
    speed<-PokeList$stats$base_stat[6]
    
    LocalDF<-LocalDF%>%mutate(hp,attack,defense,special_attack   ,special_defense ,speed)
  }
  
  ##process and add type data if user selects type TRUE
  if(type){
    ##All pokemon has at least one type so assign here
    type_one<-PokeList$types$type.name[1]
    
    ##check if more than one type and set 
    ##second type as needed
    if(length(PokeList$types$slot)>1){
      type_two<-PokeList$types$type.name[2]
    }else{
      type_two<-"None"
    }
    
    LocalDF<-LocalDF%>%mutate(type_one,type_two)
  }
  
  
  return(LocalDF)
  
}

Examples of ways to call getOnePokeData.

getOnePokeData("Venusaur")
getOnePokeData(pokemon=8,basestat = TRUE)
getOnePokeData(435,type = TRUE)
getOnePokeData(10032,basestat = TRUE,type = TRUE)

getEveryPokeData This function returns data for ALL pokemon and returns one data frame. The amount of data returned is dependent on the basetat and type options as described in getOnePokeData.

getEveryPokeData<-function(basestat=FALSE,type=FALSE){
  
  ###Get current number of pokemon to process
  #getPokeNameID
  pokeNameID<-getPokeNameID()
  idVals<-pokeNameID$ID
  
  ###Loop through every pokemon and build data frame
  ###by adding new rows
  ###Most of the time spent here is in the numerous 
  ###   calls to API address since there are so many pokemon
  allPoke<-data.frame()
  for (i in idVals) {
    allPoke<-rbind(allPoke,getOnePokeData(i,basestat,type))
  }
  
  return(allPoke)
}

Example of getEveryPokeData data frame data.

everyPoke<-getEveryPokeData(basestat = TRUE,type = TRUE)
kable(head(everyPoke))

name	id	species	height	weight	base_experience	hp	attack	defense	special_attack	special_defense	speed	type_one	type_two
bulbasaur	1	bulbasaur	7	69	64	45	49	49	65	65	45	grass	poison
ivysaur	2	ivysaur	10	130	142	60	62	63	80	80	60	grass	poison
venusaur	3	venusaur	20	1000	236	80	82	83	100	100	80	grass	poison
charmander	4	charmander	6	85	62	39	52	43	60	50	65	fire	None
charmeleon	5	charmeleon	11	190	142	58	64	58	80	65	80	fire	None
charizard	6	charizard	17	905	240	78	84	78	109	85	100	fire	flying

Species Endpoint Functions

Most pokemon species map to one individual pokemon but there are some species that map to several individual pokemon. Collective species data is obtained from the Pokemon Species endpoint. Because species data is less complex, I was able to return more default data from this endpoint than the pokemon endpoint. I’ve provided three functions to query and process pokemon endpoint data. The functions all return data frames.

getSpeciesNameID This function returns a data frame with a list of possible species names and id values so that the user will know what is available. Optional sorting by name is provided.

getSpeciesNameID <- function(sortName=FALSE){
  
  apiData<-fromJSON("https://pokeapi.co/api/v2/pokemon-species/?limit=1222")
  
  allNames<-as_tibble(apiData$results)
  
  allNames<-allNames %>% mutate(ID=as.numeric(basename(url)))
  
  if (sortName) {
    allNames<-allNames %>% arrange(name)
  }
  
  return(allNames)
  
}

getOneSpeciesData Given species name or id this function returns a data frame for one species with the following data. species,shape,generation,base_happiness,capture_rate,gender_rate,hatch_counter,is_baby,is_legendary,is_mythical.
Optionally, the user can select to return only the categorical data for this endpoint by turning on the onlyCat option.

getOneSpeciesData<-function(species,onlyCat=FALSE){
   
   ##Get list of species and process user species input
   pokeSpeciesID<-getSpeciesNameID()
   
   if (is.numeric(species)){
     pokeSpeciesID<-pokeSpeciesID%>%filter(ID==species)
   } else if (is.character(species)){
     pokeSpeciesID<-pokeSpeciesID%>%filter(name==tolower(species))
   } else {
     stop("Please enter either species integer or quoated name value")
   }
   
   PokeList<- fromJSON(pokeSpeciesID$url,flatten = TRUE)
   
   ###Function Data to return
   species<-PokeList$name
   shape<-PokeList$shape$name
   generation<-PokeList$generation$name
   base_happiness<-PokeList$base_happiness
   capture_rate<-PokeList$capture_rate
   gender_rate<-PokeList$gender_rate
   hatch_counter<-PokeList$hatch_counter
   is_baby<-PokeList$is_baby
   is_legendary<-PokeList$is_legendary
   is_mythical<-PokeList$is_mythical

   
   LocalDF<-data.frame(species,shape,generation,base_happiness,  
            capture_rate,gender_rate,hatch_counter,  
            is_baby,is_legendary,is_mythical)
   
   if(onlyCat){
     LocalDF<-LocalDF %>% select(-base_happiness,-capture_rate,
                        -gender_rate,-hatch_counter)
   }
   
   return(LocalDF)
   
 }

getEverySpeciesData This function returns data for every species as a data frame with optional sorting of the data based on the sortName option. The following data is returned
species,shape,generation,base_happiness,capture_rate,gender_rate,hatch_counter,is_baby,is_legendary,is_mythical.
Optionally, the user can select to return only the categorical data for this endpoint by turning on the onlyCat option.

getEverySpeciesData<-function(sortName=FALSE,onlyCat=FALSE){
   
   ###Get current number of species to process
   pokeSpeciesID<-getSpeciesNameID()
   idVals<-pokeSpeciesID$ID
   
   
   ###Loop through every species and build data frame
   ###by adding new rows
   ###Most of the time spent here is in the numerous 
   ###   calls to API address since there are so many species
   allPoke<-data.frame()
   for (i in idVals) {
     allPoke<-rbind(allPoke,getOneSpeciesData(i,onlyCat))
   }
   
   if (sortName) {
     allPoke<-allPoke %>% arrange(species)
   }
   
   return(allPoke)
 }

Example of getEverySpeciesData data frame data.

everyPokeSpecies<-getEverySpeciesData(sortName = TRUE)
kable(head(everyPokeSpecies))

species	shape	generation	base_happiness	capture_rate	gender_rate	hatch_counter	is_baby	is_legendary	is_mythical
abomasnow	upright	generation-iv	70	60	4	20	FALSE	FALSE	FALSE
abra	upright	generation-i	70	200	2	20	FALSE	FALSE	FALSE
absol	quadruped	generation-iii	35	30	4	25	FALSE	FALSE	FALSE
accelgor	arms	generation-v	70	75	4	15	FALSE	FALSE	FALSE
aegislash	blob	generation-vi	70	45	4	20	FALSE	FALSE	FALSE
aerodactyl	wings	generation-i	70	45	1	35	FALSE	FALSE	FALSE

Evolution Chain Endpoint Functions

Many pokemon can evolve into another more powerful pokemon. Evolution chain data is obtained from the Pokemon Evolution Chain Endpoint. This endpoint only takes ID and those IDs are linked to one part of a chain.

I’ve provided three functions to query and process pokemon evolution chain endpoint data. The functions both return data frames.

getOneEvolveData This function takes an ID number for one of the chains and returns the chain data for that chain as data frame. Each data frame row has a value for a chain level or None if that chain does not have all three stages.

getOneEvolveData<-function(ID){
  
  ###Construct URL from the given ID and call API
  basicURL<-"https://pokeapi.co/api/v2/evolution-chain/"
  queryURL<-paste0(basicURL,ID)
  queryResult<-fromJSON(queryURL)
  
  ###Parse results into stages or no evolve categories
  stageOne<-queryResult$chain$species$name
  stageTwo<-queryResult[["chain"]][["evolves_to"]][["species"]][["name"]]
  stageThree<-queryResult[["chain"]][["evolves_to"]][["evolves_to"]][[1]][["species"]][["name"]] 
  if (is.null(stageTwo)){
    stageTwo<-"None"
  }
  if (is.null(stageThree)){
    stageThree<-"None"
  }
  
  localDF<-data.frame(stageOne,stageTwo,stageThree)
  return(localDF)
}

An example of data frame returned from getOneEvolveData

  kable(getOneEvolveData(57))

stageOne	stageTwo	stageThree
mime-jr	mr-mime	mr-rime

getAllEvolveSeries This function returns a data frame of all the evolve stage items. The function will optionally sort on the first stage value.

getAllEvolveSeries<-function(sortName=FALSE){
  
  metaEvolve<-fromJSON("https://pokeapi.co/api/v2/evolution-chain/?limit=600")
  
  metaEvolveDF<-as_tibble(metaEvolve$results)
  
  metaEvolveDF<-metaEvolveDF %>% mutate(ID=as.numeric(basename(url)))
  
  ##Loop through all the ID values and build a data frame
  ## for all the evolution chain data
  allEvolve<-data.frame()
  for (loopID in metaEvolveDF$ID) {
    allEvolve<-rbind(allEvolve,getOneEvolveData(loopID))
  } 
  
   if (sortName) {
     allEvolve<-allEvolve %>% arrange(stageOne)
   }
  
  return(allEvolve)
}

getAllEvolveStages This function takes data parsed by chain and converts the data into a data frame containing species name and stage value for that species. The function will optionally sort on species.

getAllEvolveStages<-function(sortName=FALSE){
  
  resultsEvolve<-getAllEvolveSeries()
  
  ###Handles the first one which they all have
  ###Now can do stageTwo and three
  allEvolve<-data.frame()
  species<-resultsEvolve$stageOne
  stages<-ifelse(resultsEvolve$stageTwo=="None",stage<-"noEvolve",stage<-"one")
  stages
  allEvolve<-data.frame(species,stages)
  ###Need to use rbind to add other parts after this part
  species<-resultsEvolve$stageTwo
  stages<-ifelse(resultsEvolve$stageTwo=="None",stage<-"noEvolve",stage<-"two")
  twoEvolve<-data.frame(species,stages)
  twoEvolve<-twoEvolve %>% filter(species!="None")
  allEvolve<-rbind(allEvolve,twoEvolve)
  
  species<-resultsEvolve$stageThree
  stages<-ifelse(resultsEvolve$stageThree=="None",stage<-"noEvolve",stage<-"three")
  threeEvolve<-data.frame(species,stages)
  threeEvolve<-threeEvolve %>% filter(species!="None")
  allEvolve<-rbind(allEvolve,threeEvolve)
  ###Later use distinct function to remove duplicate rows
  allEvolve<-allEvolve %>% distinct(species,.keep_all = TRUE)
  
  allEvolve$stages<-as.factor(allEvolve$stages)
  allEvolve$stages<-ordered(allEvolve$stages,levels=c("one","two","three","noEvolve"))
  
  if (sortName) {
     allEvolve<-allEvolve %>% arrange(species)
   }
  
  return(allEvolve)
}

An example of output from getAllEvolveStages.

  evolveStages<-getAllEvolveStages(sortName = TRUE)
  kable(head(evolveStages))

species	stages
abomasnow	two
abra	one
absol	noEvolve
accelgor	two
aegislash	three
aerodactyl	noEvolve

Berries Endpoint Functions

Berries can provide various benefits to pokemon when they eat berries.

I’ve provided three functions to query and process berries data. The functions all return data frames.

getBerryNameID This function returns a data frame with a list of possible berry names and id values so that the user will know what is available. Optional sorting by name is provided.

getBerryNameID <- function(sortName=FALSE){
  
  apiData<-fromJSON("https://pokeapi.co/api/v2/berry/?limit=1222")
  
  allNames<-as_tibble(apiData$results)
  
  allNames<-allNames %>% mutate(ID=as.numeric(basename(url)))
  
  if (sortName) {
    allNames<-allNames %>% arrange(name)
  }
  
  return(allNames)
  
}

getOneBerryData Given berry name or id this function returns a data frame for one berry.

The user must select the variables returned by providing a character vector with variable names from this set.
name,growth_time,max_harvest,natural_gift_power, size,smoothness,soil_drynes
To select all variables, only assign “full” to the vector.

getOneBerryData<-function(berry,variables){
  
  ##Get list of berries and process user berry input
  pokeBerryID<-getBerryNameID()
  
  if (is.numeric(berry)){
    pokeBerryID<-pokeBerryID%>%filter(ID==berry)
  } else if (is.character(species)){
    pokeBerryID<-pokeBerryID%>%filter(name==tolower(berry))
  } else {
    stop("Please enter either species integer or quoated name value")
  }
  
  BerryList<- fromJSON(pokeBerryID$url,flatten = TRUE)
  
  ###Function Data to return
  name<-BerryList$name
  growth_time<-BerryList$growth_time
  max_harvest<-BerryList$max_harvest
  natural_gift_power<-BerryList$natural_gift_power
  size<-BerryList$size
  smoothness<-BerryList$smoothness
  soil_drynes<-BerryList$soil_dryness
  
  
  
  LocalDF<-data.frame(name,growth_time,max_harvest,natural_gift_power,
                      size,smoothness,soil_drynes)
  
   if (variables[1]!="full"){
      LocalDF<-LocalDF%>%select(all_of(variables))
   }
  
  return(LocalDF)
  
}

Examples of getOneBerryData usage with output.

kable(getOneBerryData(34,"full"))  

name	growth_time	max_harvest	natural_gift_power	size	smoothness	soil_drynes
durin	15	15	80	280	35	8

kable(getOneBerryData(22,c("name","size","smoothness")))

name	size	smoothness
kelpsy	150	20

getEveryBerryData This function returns data for every name as a data frame with optional sorting of the data based on the sortName option.

getEveryBerryData<-function(sortName=FALSE,variables){
  
  ###Get current number of berries to process
  pokeBerryID<-getBerryNameID()
  idVals<-pokeBerryID$ID
  
  ###Loop through every berry and build data frame
  ###by adding new rows
  ###Most of the time spent here is in the numerous 
  ###   calls to API address since there are so many species
  allBerry<-data.frame()
  for (i in idVals) {
    allBerry<-rbind(allBerry,getOneBerryData(i,variables))
  }
  
  if (sortName) {
    allBerry<-allBerry %>% arrange(name)
  }
  
  return(allBerry)
}

An example of getEveryBerryData usage to return all data sorted by berry name.

kable(head(getEveryBerryData(sortName = TRUE,"full")))

name	growth_time	max_harvest	natural_gift_power	size	smoothness	soil_drynes
aguav	5	5	60	64	25	10
apicot	24	5	80	75	40	4
aspear	3	5	60	50	25	15
babiri	18	5	60	265	35	6
belue	15	15	80	300	35	8
bluk	2	10	70	108	20	35

Forms Endpoint Functions

Pokemon Forms are ways different pokemon might appear visually and can differ in different situations like battle.

I’ve provided three functions to query and process forms data. The functions all return data frames.

getFormNameID This function returns a data frame with a list of possible names and id values so that the user will know what is available. Optional sorting by name is provided.

getFormNameID <- function(sortName=FALSE){
  
  apiData<-fromJSON("https://pokeapi.co/api/v2/pokemon-form/?limit=1300")
  
  allNames<-as_tibble(apiData$results)
  
  allNames<-allNames %>% mutate(ID=as.numeric(basename(url)))
  
  if (sortName) {
    allNames<-allNames %>% arrange(name)
  }
  
  return(allNames)
  
}

getOneFormData Given pokemon name or id this function returns a data frame for one species form. Be aware that many pokemon were not assigned a form by the pokemon API and those pokemon forms return “”.

The user must select the variables returned by providing a character vector with variable names from this set.
name,form_name,is_battle_only,is_default,is_mega,version_group

To select all variables, only assign “full” to the vector.

getOneFormData<-function(form,variables){
  
  ##Get list of forms and process user species input
  pokeFormID<-getFormNameID()
  
  if (is.numeric(form)){
    pokeFormID<-pokeFormID%>%filter(ID==form)
  } else if (is.character(species)){
    pokeFormID<-pokeFormID%>%filter(name==tolower(form))
  } else {
    stop("Please enter either species integer or quoated name value")
  }
  
  FormList<- fromJSON(pokeFormID$url,flatten = TRUE)
  
  ###Function Data to return
  name<-FormList$name
  form_name<-FormList$form_name
  is_battle_only<-FormList$is_battle_only
  is_default<-FormList$is_default
  is_mega<-FormList$is_mega
  version_group<-FormList$version_group$name
  
  LocalDF<-data.frame(name,form_name,is_battle_only,is_default,is_mega,version_group)
  
  if (variables[1]!="full"){
    LocalDF<-LocalDF%>%select(all_of(variables))
  }
  
  return(LocalDF)
  
}

kable(getOneFormData(413,"full"))

name	form_name	is_battle_only	is_default	is_mega	version_group
wormadam-plant	plant	FALSE	TRUE	FALSE	diamond-pearl

getEveryFormData This function returns data for every berry as a data frame with optional sorting of the data based on the sortName option.

The user must select the variables returned by providing a character vector with variable names from this set.
name,form_name,is_battle_only,is_default,is_mega,version_group

To select all variables, only assign “full” to the vector.

getEveryFormData<-function(sortName=FALSE,variables){
  
  ###Get current number of forms to process
  pokeFormID<-getFormNameID()
  idVals<-pokeFormID$ID
  
  ###Loop through every form and build data frame
  ###by adding new rows
  ###Most of the time spent here is in the numerous 
  ###   calls to API address since there are so many species
  allForm<-data.frame()
  for (i in idVals) {
    allForm<-rbind(allForm,getOneFormData(i,variables))
  }
  
  if (sortName) {
    allForm<-allForm %>% arrange(name)
  }
  
  return(allForm)
}

Exploratory Data Analysis

Get Full Data Frames

I started by pulling data from the first three endpoints I wrote functions for earlier. I found that those three endpoints had enough relevant data for all the required analysis. I pull all the data here so that I’ll have it stored in objects for later use.

###Get all the data needed for data exploration
allPoke<-getEveryPokeData(basestat = TRUE,type = TRUE)
allSpecies<-getEverySpeciesData()
allStages<-getAllEvolveStages()

Creating New Variables

In this section I create new variables that I plan to use in later analysis.
First, I create a totalPts quantitative variable based on adding related point based variables. In pokemon references different pokemon are often compared based on total points. Here is a reference showing total poins for one particular pokemon Total Point Example

Second, I create a hgtwgt_ratio quantitative variable based on the basic height to weight ratio. This ratio is often used in biology.

Third, I create a common categorical variable based on other species categorical variables. I wanted every species to be in one common category that eventually will show whether the species is in one of the rare categories like legendary or mythical.

Fourth, I create a related rare categorical variable that assigns each species to either rare of regular status.

###total points
moreAllPoke<-allPoke %>% 
  mutate(totalPts=(hp+attack+defense+special_attack +special_defense
  +speed)) %>% 
  select(name,id,species,height,weight,base_experience,totalPts,everything())

###height to weight ratio
moreAllPoke<-moreAllPoke %>%mutate(hgtwgt_ratio=height/weight)

###mythic,legendary, regular,baby
###Create new common variable that assigns one of these values
moreAllSpecies<-allSpecies %>% 
  mutate(common=if_else(is_baby, "baby",
              if_else(is_mythical,"mythical",
                    if_else(is_legendary,"legendary","regular"))))

###Create new rare variable to more broadly categorize rare and regular 
moreAllSpecies<-moreAllSpecies %>% 
  mutate(rare=if_else(is_baby |is_mythical |is_legendary, "rare",
                      "regular")) 

Contingency Tables

Contingency Table One

Every individual pokemon has one of 18 different pokemon types. I created my first contingency table to examine how many pokemon were part of each type by the evolution stage for that pokemon. I included non evolving pokemon because many pokemon do not evolve. I added margin sums to help spot trends between categories.

###combine needed tables to get data together for table
combinePoke<-inner_join(moreAllPoke,allStages,by="species") %>% select(name,stages,everything())

###Create table then add margins to include sums
tOne<-table(combinePoke$type_one,combinePoke$stages )
kable(addmargins(tOne),caption = "Contingency Table of Type by Stage")

	one	two	three	noEvolve	Sum
bug	27	34	12	11	84
dark	16	19	4	9	48
dragon	7	9	10	17	43
electric	12	31	8	26	77
fairy	7	10	3	4	24
fighting	13	20	3	9	45
fire	18	25	15	11	69
flying	2	2	2	3	9
ghost	15	16	5	10	46
grass	31	38	19	8	96
ground	16	17	3	6	42
ice	11	16	4	8	39
normal	43	39	11	25	118
poison	16	21	3	3	43
psychic	18	19	12	31	80
rock	20	20	7	26	73
steel	8	12	7	13	40
water	45	54	20	22	141
Sum	325	402	148	242	1117

Contingency Table of Type by Stage

What stood out to me from the first table in regard to evolution stages was that there were many more first and second stage pokemon than third stage. That outcome made sense because players go from lower to higher stages over time so fewer third stage were expected.
As for pokemon types, I immediately notice there are very few flying types. I also noticed water, normal, and grass were most numerous. The other trend I saw was that each type tends to follow the overall pattern of more first and second stage pokemon.

Contingency Table Two

I learned pokemon generations are used to group pokemon over time with i being oldest and viii being most recent. For the second table, I looked at generation versus what I called common status meaning whether a pokemon is regular or in rare category. I added margin sums again to help spot patterns. Note this data is from species data and there are fewer species than pokemon.

#Create table for generation and common categories
tTwo<-table(moreAllSpecies$generation,moreAllSpecies$common)
kable(addmargins(tTwo),
      caption = "Contingency Table of Generation by Common Status")

	baby	legendary	mythical	regular	Sum
generation-i	0	4	1	146	151
generation-ii	8	5	1	86	100
generation-iii	2	8	2	123	135
generation-iv	8	9	5	85	107
generation-v	0	9	4	143	156
generation-vi	0	3	3	66	72
generation-vii	0	9	5	74	88
generation-viii	0	10	1	78	89
Sum	18	57	22	801	898

Contingency Table of Generation by Common Status

What stood out to me in the second table was there were more pokemon species created in the i to V earlier generations than the more recent generations. In addition, I did confirm that the rare types are indeed rare with baby being especially uncommon.

Numerical Summaries

Capture Rate By Generation
I learned that capture rate is a key value where higher numbers mean easier to catch. I summarized by generation to see whether capture rate was changing over time.

allSpecies %>% group_by(generation) %>% 
  summarise(Avg = mean(capture_rate), Sd = sd(capture_rate), 
    Median = median(capture_rate), IQR =IQR(capture_rate)) %>% kable()

generation	Avg	Sd	Median	IQR
generation-i	106.18543	77.10654	75.0	145.0
generation-ii	91.90000	71.67611	60.0	75.0
generation-iii	113.35556	83.82003	90.0	145.0
generation-iv	78.85981	69.46174	45.0	75.0
generation-v	103.10256	76.61131	75.0	145.0
generation-vi	100.40278	72.47664	62.5	120.0
generation-vii	77.72727	67.96918	45.0	47.5
generation-viii	97.28090	82.43262	60.0	82.0

I did not spot a clear pattern over time in the capture rates. I did notice the IQR varied a lot from year to year meaning variability of capture rate did change a lot over time but in no clear pattern.

Height to Weight Ration by Common Status

I next looked at height to weight ratio by the common categories.

#using hgtwgt_ratio
comboSpeciesPoke<-inner_join(moreAllPoke,moreAllSpecies,by="species")
comboSpeciesPoke %>% group_by(common)  %>% 
  summarise(Avg = mean(hgtwgt_ratio), Sd = sd(hgtwgt_ratio), 
      Median =  median(hgtwgt_ratio), IQR =IQR(hgtwgt_ratio)) %>% kable() 

common	Avg	Sd	Median	IQR
baby	0.0950516	0.0979774	0.0445055	0.1046340
legendary	0.0800844	0.2755390	0.0181102	0.0232564
mythical	0.0589330	0.0606246	0.0292845	0.0673077
regular	0.1145914	0.6820708	0.0368272	0.0450000

From this summary, I noticed the regular category had the highest average but the regular median was not too different from mythical. I’d really hoped to see more with this particular summary.

Total Points by Common Status

For my third numerical summary, I looked at total points by common types.

comboSpeciesPoke %>% group_by(common) %>% 
    summarise(Avg = mean(totalPts), Sd = sd(totalPts), Median =       
              median(totalPts), IQR =IQR(totalPts)) %>% kable()

common	Avg	Sd	Median	IQR
baby	276.2778	61.34263	282.5	90.25
legendary	627.3333	103.11584	600.0	100.00
mythical	595.0000	66.04007	600.0	0.00
regular	422.1835	102.83936	440.0	175.00

As expected the legendary and mythical types have much higher total points(a measure of power) than the other types with baby having the least points.

Bar Plot

I was curious to learn more about the pokemon species that do not evolve. I first needed to join data so I’d have what I needed together for a bar plot.

###combine data frames to give access to rare and species data
###Then create bar plot of this data.  
combineSpeciesStage<-inner_join(moreAllSpecies,allStages,by="species")


g <- ggplot(combineSpeciesStage, aes(x = stages))
g + geom_bar(aes(fill=(rare)),position = "dodge") +
  scale_fill_discrete(name="Species\nCategories") + 
  labs(x="Evolution Stages", y="Count",
  title = "Bar Plot of Evolution Stages for Rare and Regular Species")

The bar plot gave me new insight about non evolving pokemon species. The plots shows many of the non evolving species are one of the rare groups like mythical or legendary. It made sense to me that rare species would not evole or otherwise they would not be so uncommon.

Box Plot

I was interested in investigating the relationship between pokemon total points and evolution stage in a box plot. I needed to manually adjust the colors because the automatic colors were blending into the background.

###Create a boxplot with added points for stage and total points  
g <- ggplot(combinePoke, aes(x = stages, y = totalPts))
g + geom_boxplot(fill="green1") + 
  geom_point((aes(color = stages)), size=1,position = "jitter",alpha = 0.1) +
  labs(x="Evolution Stages", y="Total Points",
  title = "Boxplot of Total Points for Different Evolution Stages") + 
  scale_color_manual(values = c("red", "blue", "orangered","purple"),name ="Evolution\nStages") 

The boxplot confirmed my expectation that total points would be higher for higher pokemon evolution stages. This outcome makes sense because more evolved pokemon are more powerful and power is quantified by total points. In addition, I could see that the no evolving pokemon are also mainly very powerful too and have a lot of variability.

Histogram

I learned that the pokemon hatch counter variable determines how long it takes for pokemon eggs to hatch. Per the doc, “Initial hatch counter: one must walk 255 × (hatch_counter + 1) steps before this Pokemon’s egg hatches, unless utilizing bonuses like Flame Body’s.” I wanted to get an graph of the distribution of hatch count by creating a histogram.

###creating histogram of hatch_counter data 
g <- ggplot(moreAllSpecies, aes( x = hatch_counter))
g + geom_histogram(binwidth=8,color = "brown", fill = "green", 
  size = 1)  + labs(x="Hatch Counter", y="Count",
  title = "Histogram of Pokemon Hatch Counter") 

The histogram of hatch counter appears to be right skewed with most hatch counter values being smaller but a smaller number being larger values.

Histogram Plus Density

After I completed the histogram, I was curious what might explain the higher hatch counter values being so infrequent? To address this question, I created a density overlay by type of pokemon.

###creating histogram of hatch_counter data plus density 
g <- ggplot(moreAllSpecies, aes(y=..density.., x = hatch_counter))
g + geom_histogram(binwidth=8,color = "brown", fill = "green", 
    size = 1)  + labs(x="Hatch Counter", y="Density", title = 
      "Histogram of Pokemon Hatch Counter\nWith Pokemon Category Density",
       fill="Species\nCategories") + 
  geom_density(adjust = 0.5, alpha = 0.5, aes(fill = common), 
    position = "stack")

Adding the density plots did help explain the larger hatch counter values. The density plots show that the larger hatch counter values go with the rare legendary and mythical pokemon. Part of being rare would be that those types would not hatch as often as other types.

Scatter Plot One

Base experience is the number of experience points awarded when a pokemon is defeated. I created a scatter plot of total points (a measure of total power) and base experience to see whether players are rewarded in propoption to the power of a pokemon opponent.

#Setup for a scatter plot of base_experience and total points
corExpPts<-cor(comboSpeciesPoke$base_experience,comboSpeciesPoke$totalPts)

g<-ggplot(data = comboSpeciesPoke,aes(x=totalPts,y=base_experience))
g+geom_point(aes(color=rare)) + 
  geom_smooth(method = lm) +
  geom_text(x=350,y=500,size=5,
        label = paste0("Correlation = ",round(corExpPts, 2))) +
  labs(x="Total Points", y="Base Experience",
       title = "Scatter Plot of Pokemon Base Exerience Versus Total Points",
       color="Species\nCategories")

Although I was not surprised to see a linear relationship, I was surprised to see just how strong the positive correlation was between these variables. This did confirm players are rewarded based in proportion to total power. The other interesting trend in this graph was how there are four almost straight lines with very similar slopes.

Scatter Plot Two

Next I wanted to do a scatter plot for variables that I suspected to have a negative correlation. The capture rate variable measures how hard or easy it is to capture a pokemon with lower values being harder and higher values being easier to catch. I created a scatter plot of total points (a measure of total power) and and capture rate to see whether these would be negatively correlated as I expected.

#Setup for a scatter plot of  total points and capture_rate
corrCapPts<-cor(comboSpeciesPoke$capture_rate,comboSpeciesPoke$totalPts)

g<-ggplot(data = comboSpeciesPoke,aes(x=capture_rate,y=totalPts))
g+geom_point(aes(color=common))  + 
  geom_smooth(method = lm) +
  geom_text(x=50,y=950,size=5,
      label = paste0("Correlation = ",round(corrCapPts, 2))) +
  labs(x="Capture Rate",y="Total Points",color='Species\nCategories',
      title = "Scatter Plot of Pokemon Total Points Versus Capture Rate ") 

From the scatter plot, I did confirm these variables are moderately negatively correlated as I expected. It made sense that more power (higher points) pokemon would be harder to capture (lower capture rates). This trend was specially true for the more common regular pokemon group.

From the previous scatter plot, I was curious what the scatter plot would show if it were separately drawn for each category. I created a facet wrap version to examine this question.

##
g<-ggplot(data = comboSpeciesPoke,aes(x=capture_rate,y=totalPts))
g+geom_point() + facet_wrap(~common) + 
  labs(x="Capture Rate",y="Total Points",
    color='Species\nCategories',
    title = "Facet Wrapped\nScatter Plots of Pokemon Total Points Versus Capture Rate ") 

The first part of the facet wrap graph that stood out to me is that the regular type most clearly shows the negative correlation. The second part that stood out to me is that the legendary type might actually be positively correlated or not be very correlated at all. The other types did not show clear patterns.