Skip to the content.

Pokemon API Vignette

Mark Austin 10/05/2021

Introduction

This is a vignette for reading and summarizing data from the Pokemon API using Pokemon API Documentation. After describing many API query functions with accompanying R code, the vignette presents exploratory data analysis again using R code.

Required R Packages

The following R packages are required to run R code used in this Vignette and/or create this document.

Pokemon API Query and Data Proccessing Functions

I created the following functions to query and process data from the Pokemon API using Pokemon API Documentation. I found that I could use fromJSON() directly with the Pokemon API and directly assign API calls to list of list objects.

For each endpoint, users can customize their query to return specific data based on names or ids relevant to that endpoint. I provide metadata functions so that users will know what names and ids are valid for a given endpoint. I also tried different approaches to giving the user flexibility in the data returned with some functions returning groups of data and other letting the user do more customization.

Pokemon Endpoint Functions

Most data relevant to individual pokemon is obtained from the Pokemon endpoint. This endpoint returns a complex list of lists with more data than most users would need. I’ve provided three functions to query and process pokemon endpoint data. The functions all return data frames.

getPokeNameID <- function(sortName=FALSE){
  
  apiData<-fromJSON("https://pokeapi.co/api/v2/pokemon/?limit=1222")
  
  allNames<-as_tibble(apiData$results)
  
  allNames<-allNames %>% mutate(ID=as.numeric(basename(url)))
  
  if (sortName) {
    allNames<-allNames %>% arrange(name)
  }
  
  return(allNames)
  
}

Example getPokeNameIDFunction usage with output.

kable(head(getPokeNameID(sortName = TRUE)))
name url ID
abomasnow https://pokeapi.co/api/v2/pokemon/460/ 460
abomasnow-mega https://pokeapi.co/api/v2/pokemon/10060/ 10060
abra https://pokeapi.co/api/v2/pokemon/63/ 63
absol https://pokeapi.co/api/v2/pokemon/359/ 359
absol-mega https://pokeapi.co/api/v2/pokemon/10057/ 10057
accelgor https://pokeapi.co/api/v2/pokemon/617/ 617
getOnePokeData<-function(pokemon,basestat=FALSE,type=FALSE){
  
  ##Get list of pokemon and process user pokemon input
  pokeNameID<-getPokeNameID()
  
  if (is.numeric(pokemon)){
    pokeNameID<-pokeNameID%>%filter(ID==pokemon)
  } else if (is.character(pokemon)){
    pokeNameID<-pokeNameID%>%filter(name==tolower(pokemon))
  } else {
    stop("Please enter either pokemon integer or quoated name value")
  }
  
  PokeList<- fromJSON(pokeNameID$url,flatten = TRUE)
  
  ###Function Default Data
  name<-PokeList$name
  height<-PokeList$height
  id<-PokeList$id
  species<-PokeList$species$name
  weight<-PokeList$weight
  base_experience<-PokeList$base_experience
  
  LocalDF<-data.frame(name,id,species,height,weight,base_experience)
  
  ##process and add base stat data if user selects basestat TRUE
  if (basestat){
    hp<-PokeList$stats$base_stat[1]
    attack<-PokeList$stats$base_stat[2]
    defense<-PokeList$stats$base_stat[3]
    special_attack<-PokeList$stats$base_stat[4]
    special_defense<-PokeList$stats$base_stat[5]
    speed<-PokeList$stats$base_stat[6]
    
    LocalDF<-LocalDF%>%mutate(hp,attack,defense,special_attack   ,special_defense ,speed)
  }
  
  ##process and add type data if user selects type TRUE
  if(type){
    ##All pokemon has at least one type so assign here
    type_one<-PokeList$types$type.name[1]
    
    ##check if more than one type and set 
    ##second type as needed
    if(length(PokeList$types$slot)>1){
      type_two<-PokeList$types$type.name[2]
    }else{
      type_two<-"None"
    }
    
    LocalDF<-LocalDF%>%mutate(type_one,type_two)
  }
  
  
  return(LocalDF)
  
}

Examples of ways to call getOnePokeData.

getOnePokeData("Venusaur")
getOnePokeData(pokemon=8,basestat = TRUE)
getOnePokeData(435,type = TRUE)
getOnePokeData(10032,basestat = TRUE,type = TRUE)
getEveryPokeData<-function(basestat=FALSE,type=FALSE){
  
  ###Get current number of pokemon to process
  #getPokeNameID
  pokeNameID<-getPokeNameID()
  idVals<-pokeNameID$ID
  
  ###Loop through every pokemon and build data frame
  ###by adding new rows
  ###Most of the time spent here is in the numerous 
  ###   calls to API address since there are so many pokemon
  allPoke<-data.frame()
  for (i in idVals) {
    allPoke<-rbind(allPoke,getOnePokeData(i,basestat,type))
  }
  
  return(allPoke)
}

Example of getEveryPokeData data frame data.

everyPoke<-getEveryPokeData(basestat = TRUE,type = TRUE)
kable(head(everyPoke))
name id species height weight base_experience hp attack defense special_attack special_defense speed type_one type_two
bulbasaur 1 bulbasaur 7 69 64 45 49 49 65 65 45 grass poison
ivysaur 2 ivysaur 10 130 142 60 62 63 80 80 60 grass poison
venusaur 3 venusaur 20 1000 236 80 82 83 100 100 80 grass poison
charmander 4 charmander 6 85 62 39 52 43 60 50 65 fire None
charmeleon 5 charmeleon 11 190 142 58 64 58 80 65 80 fire None
charizard 6 charizard 17 905 240 78 84 78 109 85 100 fire flying

Species Endpoint Functions

Most pokemon species map to one individual pokemon but there are some species that map to several individual pokemon. Collective species data is obtained from the Pokemon Species endpoint. Because species data is less complex, I was able to return more default data from this endpoint than the pokemon endpoint. I’ve provided three functions to query and process pokemon endpoint data. The functions all return data frames.

getSpeciesNameID <- function(sortName=FALSE){
  
  apiData<-fromJSON("https://pokeapi.co/api/v2/pokemon-species/?limit=1222")
  
  allNames<-as_tibble(apiData$results)
  
  allNames<-allNames %>% mutate(ID=as.numeric(basename(url)))
  
  if (sortName) {
    allNames<-allNames %>% arrange(name)
  }
  
  return(allNames)
  
}
getOneSpeciesData<-function(species,onlyCat=FALSE){
   
   ##Get list of species and process user species input
   pokeSpeciesID<-getSpeciesNameID()
   
   if (is.numeric(species)){
     pokeSpeciesID<-pokeSpeciesID%>%filter(ID==species)
   } else if (is.character(species)){
     pokeSpeciesID<-pokeSpeciesID%>%filter(name==tolower(species))
   } else {
     stop("Please enter either species integer or quoated name value")
   }
   
   PokeList<- fromJSON(pokeSpeciesID$url,flatten = TRUE)
   
   ###Function Data to return
   species<-PokeList$name
   shape<-PokeList$shape$name
   generation<-PokeList$generation$name
   base_happiness<-PokeList$base_happiness
   capture_rate<-PokeList$capture_rate
   gender_rate<-PokeList$gender_rate
   hatch_counter<-PokeList$hatch_counter
   is_baby<-PokeList$is_baby
   is_legendary<-PokeList$is_legendary
   is_mythical<-PokeList$is_mythical

   
   LocalDF<-data.frame(species,shape,generation,base_happiness,  
            capture_rate,gender_rate,hatch_counter,  
            is_baby,is_legendary,is_mythical)
   
   if(onlyCat){
     LocalDF<-LocalDF %>% select(-base_happiness,-capture_rate,
                        -gender_rate,-hatch_counter)
   }
   
   return(LocalDF)
   
 }
getEverySpeciesData<-function(sortName=FALSE,onlyCat=FALSE){
   
   ###Get current number of species to process
   pokeSpeciesID<-getSpeciesNameID()
   idVals<-pokeSpeciesID$ID
   
   
   ###Loop through every species and build data frame
   ###by adding new rows
   ###Most of the time spent here is in the numerous 
   ###   calls to API address since there are so many species
   allPoke<-data.frame()
   for (i in idVals) {
     allPoke<-rbind(allPoke,getOneSpeciesData(i,onlyCat))
   }
   
   if (sortName) {
     allPoke<-allPoke %>% arrange(species)
   }
   
   return(allPoke)
 }

Example of getEverySpeciesData data frame data.

everyPokeSpecies<-getEverySpeciesData(sortName = TRUE)
kable(head(everyPokeSpecies))
species shape generation base_happiness capture_rate gender_rate hatch_counter is_baby is_legendary is_mythical
abomasnow upright generation-iv 70 60 4 20 FALSE FALSE FALSE
abra upright generation-i 70 200 2 20 FALSE FALSE FALSE
absol quadruped generation-iii 35 30 4 25 FALSE FALSE FALSE
accelgor arms generation-v 70 75 4 15 FALSE FALSE FALSE
aegislash blob generation-vi 70 45 4 20 FALSE FALSE FALSE
aerodactyl wings generation-i 70 45 1 35 FALSE FALSE FALSE

Evolution Chain Endpoint Functions

Many pokemon can evolve into another more powerful pokemon. Evolution chain data is obtained from the Pokemon Evolution Chain Endpoint. This endpoint only takes ID and those IDs are linked to one part of a chain.

I’ve provided three functions to query and process pokemon evolution chain endpoint data. The functions both return data frames.

getOneEvolveData<-function(ID){
  
  ###Construct URL from the given ID and call API
  basicURL<-"https://pokeapi.co/api/v2/evolution-chain/"
  queryURL<-paste0(basicURL,ID)
  queryResult<-fromJSON(queryURL)
  
  ###Parse results into stages or no evolve categories
  stageOne<-queryResult$chain$species$name
  stageTwo<-queryResult[["chain"]][["evolves_to"]][["species"]][["name"]]
  stageThree<-queryResult[["chain"]][["evolves_to"]][["evolves_to"]][[1]][["species"]][["name"]] 
  if (is.null(stageTwo)){
    stageTwo<-"None"
  }
  if (is.null(stageThree)){
    stageThree<-"None"
  }
  
  localDF<-data.frame(stageOne,stageTwo,stageThree)
  return(localDF)
}

An example of data frame returned from getOneEvolveData

  kable(getOneEvolveData(57))
stageOne stageTwo stageThree
mime-jr mr-mime mr-rime
getAllEvolveSeries<-function(sortName=FALSE){
  
  metaEvolve<-fromJSON("https://pokeapi.co/api/v2/evolution-chain/?limit=600")
  
  metaEvolveDF<-as_tibble(metaEvolve$results)
  
  metaEvolveDF<-metaEvolveDF %>% mutate(ID=as.numeric(basename(url)))
  
  ##Loop through all the ID values and build a data frame
  ## for all the evolution chain data
  allEvolve<-data.frame()
  for (loopID in metaEvolveDF$ID) {
    allEvolve<-rbind(allEvolve,getOneEvolveData(loopID))
  } 
  
   if (sortName) {
     allEvolve<-allEvolve %>% arrange(stageOne)
   }
  
  return(allEvolve)
}
getAllEvolveStages<-function(sortName=FALSE){
  
  resultsEvolve<-getAllEvolveSeries()
  
  ###Handles the first one which they all have
  ###Now can do stageTwo and three
  allEvolve<-data.frame()
  species<-resultsEvolve$stageOne
  stages<-ifelse(resultsEvolve$stageTwo=="None",stage<-"noEvolve",stage<-"one")
  stages
  allEvolve<-data.frame(species,stages)
  ###Need to use rbind to add other parts after this part
  species<-resultsEvolve$stageTwo
  stages<-ifelse(resultsEvolve$stageTwo=="None",stage<-"noEvolve",stage<-"two")
  twoEvolve<-data.frame(species,stages)
  twoEvolve<-twoEvolve %>% filter(species!="None")
  allEvolve<-rbind(allEvolve,twoEvolve)
  
  species<-resultsEvolve$stageThree
  stages<-ifelse(resultsEvolve$stageThree=="None",stage<-"noEvolve",stage<-"three")
  threeEvolve<-data.frame(species,stages)
  threeEvolve<-threeEvolve %>% filter(species!="None")
  allEvolve<-rbind(allEvolve,threeEvolve)
  ###Later use distinct function to remove duplicate rows
  allEvolve<-allEvolve %>% distinct(species,.keep_all = TRUE)
  
  allEvolve$stages<-as.factor(allEvolve$stages)
  allEvolve$stages<-ordered(allEvolve$stages,levels=c("one","two","three","noEvolve"))
  
  if (sortName) {
     allEvolve<-allEvolve %>% arrange(species)
   }
  
  return(allEvolve)
}

An example of output from getAllEvolveStages.

  evolveStages<-getAllEvolveStages(sortName = TRUE)
  kable(head(evolveStages))
species stages
abomasnow two
abra one
absol noEvolve
accelgor two
aegislash three
aerodactyl noEvolve

Berries Endpoint Functions

Berries can provide various benefits to pokemon when they eat berries.

I’ve provided three functions to query and process berries data. The functions all return data frames.

getBerryNameID <- function(sortName=FALSE){
  
  apiData<-fromJSON("https://pokeapi.co/api/v2/berry/?limit=1222")
  
  allNames<-as_tibble(apiData$results)
  
  allNames<-allNames %>% mutate(ID=as.numeric(basename(url)))
  
  if (sortName) {
    allNames<-allNames %>% arrange(name)
  }
  
  return(allNames)
  
}

The user must select the variables returned by providing a character vector with variable names from this set.
name,growth_time,max_harvest,natural_gift_power, size,smoothness,soil_drynes
To select all variables, only assign “full” to the vector.

getOneBerryData<-function(berry,variables){
  
  ##Get list of berries and process user berry input
  pokeBerryID<-getBerryNameID()
  
  if (is.numeric(berry)){
    pokeBerryID<-pokeBerryID%>%filter(ID==berry)
  } else if (is.character(species)){
    pokeBerryID<-pokeBerryID%>%filter(name==tolower(berry))
  } else {
    stop("Please enter either species integer or quoated name value")
  }
  
  BerryList<- fromJSON(pokeBerryID$url,flatten = TRUE)
  
  ###Function Data to return
  name<-BerryList$name
  growth_time<-BerryList$growth_time
  max_harvest<-BerryList$max_harvest
  natural_gift_power<-BerryList$natural_gift_power
  size<-BerryList$size
  smoothness<-BerryList$smoothness
  soil_drynes<-BerryList$soil_dryness
  
  
  
  LocalDF<-data.frame(name,growth_time,max_harvest,natural_gift_power,
                      size,smoothness,soil_drynes)
  
   if (variables[1]!="full"){
      LocalDF<-LocalDF%>%select(all_of(variables))
   }
  
  return(LocalDF)
  
}

Examples of getOneBerryData usage with output.

kable(getOneBerryData(34,"full"))  
name growth_time max_harvest natural_gift_power size smoothness soil_drynes
durin 15 15 80 280 35 8
kable(getOneBerryData(22,c("name","size","smoothness")))
name size smoothness
kelpsy 150 20

The user must select the variables returned by providing a character vector with variable names from this set.
name,growth_time,max_harvest,natural_gift_power, size,smoothness,soil_drynes
To select all variables, only assign “full” to the vector.

getEveryBerryData<-function(sortName=FALSE,variables){
  
  ###Get current number of berries to process
  pokeBerryID<-getBerryNameID()
  idVals<-pokeBerryID$ID
  
  ###Loop through every berry and build data frame
  ###by adding new rows
  ###Most of the time spent here is in the numerous 
  ###   calls to API address since there are so many species
  allBerry<-data.frame()
  for (i in idVals) {
    allBerry<-rbind(allBerry,getOneBerryData(i,variables))
  }
  
  if (sortName) {
    allBerry<-allBerry %>% arrange(name)
  }
  
  return(allBerry)
}

An example of getEveryBerryData usage to return all data sorted by berry name.

kable(head(getEveryBerryData(sortName = TRUE,"full")))
name growth_time max_harvest natural_gift_power size smoothness soil_drynes
aguav 5 5 60 64 25 10
apicot 24 5 80 75 40 4
aspear 3 5 60 50 25 15
babiri 18 5 60 265 35 6
belue 15 15 80 300 35 8
bluk 2 10 70 108 20 35

Forms Endpoint Functions

Pokemon Forms are ways different pokemon might appear visually and can differ in different situations like battle.

I’ve provided three functions to query and process forms data. The functions all return data frames.

getFormNameID <- function(sortName=FALSE){
  
  apiData<-fromJSON("https://pokeapi.co/api/v2/pokemon-form/?limit=1300")
  
  allNames<-as_tibble(apiData$results)
  
  allNames<-allNames %>% mutate(ID=as.numeric(basename(url)))
  
  if (sortName) {
    allNames<-allNames %>% arrange(name)
  }
  
  return(allNames)
  
}

The user must select the variables returned by providing a character vector with variable names from this set.
name,form_name,is_battle_only,is_default,is_mega,version_group

To select all variables, only assign “full” to the vector.

getOneFormData<-function(form,variables){
  
  ##Get list of forms and process user species input
  pokeFormID<-getFormNameID()
  
  if (is.numeric(form)){
    pokeFormID<-pokeFormID%>%filter(ID==form)
  } else if (is.character(species)){
    pokeFormID<-pokeFormID%>%filter(name==tolower(form))
  } else {
    stop("Please enter either species integer or quoated name value")
  }
  
  FormList<- fromJSON(pokeFormID$url,flatten = TRUE)
  
  ###Function Data to return
  name<-FormList$name
  form_name<-FormList$form_name
  is_battle_only<-FormList$is_battle_only
  is_default<-FormList$is_default
  is_mega<-FormList$is_mega
  version_group<-FormList$version_group$name
  
  LocalDF<-data.frame(name,form_name,is_battle_only,is_default,is_mega,version_group)
  
  if (variables[1]!="full"){
    LocalDF<-LocalDF%>%select(all_of(variables))
  }
  
  return(LocalDF)
  
}
kable(getOneFormData(413,"full"))
name form_name is_battle_only is_default is_mega version_group
wormadam-plant plant FALSE TRUE FALSE diamond-pearl

The user must select the variables returned by providing a character vector with variable names from this set.
name,form_name,is_battle_only,is_default,is_mega,version_group

To select all variables, only assign “full” to the vector.

getEveryFormData<-function(sortName=FALSE,variables){
  
  ###Get current number of forms to process
  pokeFormID<-getFormNameID()
  idVals<-pokeFormID$ID
  
  ###Loop through every form and build data frame
  ###by adding new rows
  ###Most of the time spent here is in the numerous 
  ###   calls to API address since there are so many species
  allForm<-data.frame()
  for (i in idVals) {
    allForm<-rbind(allForm,getOneFormData(i,variables))
  }
  
  if (sortName) {
    allForm<-allForm %>% arrange(name)
  }
  
  return(allForm)
}


Exploratory Data Analysis

Get Full Data Frames

I started by pulling data from the first three endpoints I wrote functions for earlier. I found that those three endpoints had enough relevant data for all the required analysis. I pull all the data here so that I’ll have it stored in objects for later use.

###Get all the data needed for data exploration
allPoke<-getEveryPokeData(basestat = TRUE,type = TRUE)
allSpecies<-getEverySpeciesData()
allStages<-getAllEvolveStages()

Creating New Variables

In this section I create new variables that I plan to use in later analysis.
First, I create a totalPts quantitative variable based on adding related point based variables. In pokemon references different pokemon are often compared based on total points. Here is a reference showing total poins for one particular pokemon Total Point Example

Second, I create a hgtwgt_ratio quantitative variable based on the basic height to weight ratio. This ratio is often used in biology.

Third, I create a common categorical variable based on other species categorical variables. I wanted every species to be in one common category that eventually will show whether the species is in one of the rare categories like legendary or mythical.

Fourth, I create a related rare categorical variable that assigns each species to either rare of regular status.

###total points
moreAllPoke<-allPoke %>% 
  mutate(totalPts=(hp+attack+defense+special_attack +special_defense
  +speed)) %>% 
  select(name,id,species,height,weight,base_experience,totalPts,everything())

###height to weight ratio
moreAllPoke<-moreAllPoke %>%mutate(hgtwgt_ratio=height/weight)

###mythic,legendary, regular,baby
###Create new common variable that assigns one of these values
moreAllSpecies<-allSpecies %>% 
  mutate(common=if_else(is_baby, "baby",
              if_else(is_mythical,"mythical",
                    if_else(is_legendary,"legendary","regular"))))

###Create new rare variable to more broadly categorize rare and regular 
moreAllSpecies<-moreAllSpecies %>% 
  mutate(rare=if_else(is_baby |is_mythical |is_legendary, "rare",
                      "regular")) 

Contingency Tables

Contingency Table One

Every individual pokemon has one of 18 different pokemon types. I created my first contingency table to examine how many pokemon were part of each type by the evolution stage for that pokemon. I included non evolving pokemon because many pokemon do not evolve. I added margin sums to help spot trends between categories.

###combine needed tables to get data together for table
combinePoke<-inner_join(moreAllPoke,allStages,by="species") %>% select(name,stages,everything())

###Create table then add margins to include sums
tOne<-table(combinePoke$type_one,combinePoke$stages )
kable(addmargins(tOne),caption = "Contingency Table of Type by Stage")
  one two three noEvolve Sum
bug 27 34 12 11 84
dark 16 19 4 9 48
dragon 7 9 10 17 43
electric 12 31 8 26 77
fairy 7 10 3 4 24
fighting 13 20 3 9 45
fire 18 25 15 11 69
flying 2 2 2 3 9
ghost 15 16 5 10 46
grass 31 38 19 8 96
ground 16 17 3 6 42
ice 11 16 4 8 39
normal 43 39 11 25 118
poison 16 21 3 3 43
psychic 18 19 12 31 80
rock 20 20 7 26 73
steel 8 12 7 13 40
water 45 54 20 22 141
Sum 325 402 148 242 1117

Contingency Table of Type by Stage

What stood out to me from the first table in regard to evolution stages was that there were many more first and second stage pokemon than third stage. That outcome made sense because players go from lower to higher stages over time so fewer third stage were expected.
As for pokemon types, I immediately notice there are very few flying types. I also noticed water, normal, and grass were most numerous. The other trend I saw was that each type tends to follow the overall pattern of more first and second stage pokemon.

Contingency Table Two

I learned pokemon generations are used to group pokemon over time with i being oldest and viii being most recent. For the second table, I looked at generation versus what I called common status meaning whether a pokemon is regular or in rare category. I added margin sums again to help spot patterns. Note this data is from species data and there are fewer species than pokemon.

#Create table for generation and common categories
tTwo<-table(moreAllSpecies$generation,moreAllSpecies$common)
kable(addmargins(tTwo),
      caption = "Contingency Table of Generation by Common Status")
  baby legendary mythical regular Sum
generation-i 0 4 1 146 151
generation-ii 8 5 1 86 100
generation-iii 2 8 2 123 135
generation-iv 8 9 5 85 107
generation-v 0 9 4 143 156
generation-vi 0 3 3 66 72
generation-vii 0 9 5 74 88
generation-viii 0 10 1 78 89
Sum 18 57 22 801 898

Contingency Table of Generation by Common Status



What stood out to me in the second table was there were more pokemon species created in the i to V earlier generations than the more recent generations. In addition, I did confirm that the rare types are indeed rare with baby being especially uncommon.

Numerical Summaries

Capture Rate By Generation
I learned that capture rate is a key value where higher numbers mean easier to catch. I summarized by generation to see whether capture rate was changing over time.

allSpecies %>% group_by(generation) %>% 
  summarise(Avg = mean(capture_rate), Sd = sd(capture_rate), 
    Median = median(capture_rate), IQR =IQR(capture_rate)) %>% kable()
generation Avg Sd Median IQR
generation-i 106.18543 77.10654 75.0 145.0
generation-ii 91.90000 71.67611 60.0 75.0
generation-iii 113.35556 83.82003 90.0 145.0
generation-iv 78.85981 69.46174 45.0 75.0
generation-v 103.10256 76.61131 75.0 145.0
generation-vi 100.40278 72.47664 62.5 120.0
generation-vii 77.72727 67.96918 45.0 47.5
generation-viii 97.28090 82.43262 60.0 82.0

I did not spot a clear pattern over time in the capture rates. I did notice the IQR varied a lot from year to year meaning variability of capture rate did change a lot over time but in no clear pattern.

Height to Weight Ration by Common Status

I next looked at height to weight ratio by the common categories.

#using hgtwgt_ratio
comboSpeciesPoke<-inner_join(moreAllPoke,moreAllSpecies,by="species")
comboSpeciesPoke %>% group_by(common)  %>% 
  summarise(Avg = mean(hgtwgt_ratio), Sd = sd(hgtwgt_ratio), 
      Median =  median(hgtwgt_ratio), IQR =IQR(hgtwgt_ratio)) %>% kable() 
common Avg Sd Median IQR
baby 0.0950516 0.0979774 0.0445055 0.1046340
legendary 0.0800844 0.2755390 0.0181102 0.0232564
mythical 0.0589330 0.0606246 0.0292845 0.0673077
regular 0.1145914 0.6820708 0.0368272 0.0450000

From this summary, I noticed the regular category had the highest average but the regular median was not too different from mythical. I’d really hoped to see more with this particular summary.

Total Points by Common Status

For my third numerical summary, I looked at total points by common types.

comboSpeciesPoke %>% group_by(common) %>% 
    summarise(Avg = mean(totalPts), Sd = sd(totalPts), Median =       
              median(totalPts), IQR =IQR(totalPts)) %>% kable()
common Avg Sd Median IQR
baby 276.2778 61.34263 282.5 90.25
legendary 627.3333 103.11584 600.0 100.00
mythical 595.0000 66.04007 600.0 0.00
regular 422.1835 102.83936 440.0 175.00

As expected the legendary and mythical types have much higher total points(a measure of power) than the other types with baby having the least points.

Bar Plot

I was curious to learn more about the pokemon species that do not evolve. I first needed to join data so I’d have what I needed together for a bar plot.

###combine data frames to give access to rare and species data
###Then create bar plot of this data.  
combineSpeciesStage<-inner_join(moreAllSpecies,allStages,by="species")


g <- ggplot(combineSpeciesStage, aes(x = stages))
g + geom_bar(aes(fill=(rare)),position = "dodge") +
  scale_fill_discrete(name="Species\nCategories") + 
  labs(x="Evolution Stages", y="Count",
  title = "Bar Plot of Evolution Stages for Rare and Regular Species")

The bar plot gave me new insight about non evolving pokemon species. The plots shows many of the non evolving species are one of the rare groups like mythical or legendary. It made sense to me that rare species would not evole or otherwise they would not be so uncommon.

Box Plot

I was interested in investigating the relationship between pokemon total points and evolution stage in a box plot. I needed to manually adjust the colors because the automatic colors were blending into the background.

###Create a boxplot with added points for stage and total points  
g <- ggplot(combinePoke, aes(x = stages, y = totalPts))
g + geom_boxplot(fill="green1") + 
  geom_point((aes(color = stages)), size=1,position = "jitter",alpha = 0.1) +
  labs(x="Evolution Stages", y="Total Points",
  title = "Boxplot of Total Points for Different Evolution Stages") + 
  scale_color_manual(values = c("red", "blue", "orangered","purple"),name ="Evolution\nStages") 



The boxplot confirmed my expectation that total points would be higher for higher pokemon evolution stages. This outcome makes sense because more evolved pokemon are more powerful and power is quantified by total points. In addition, I could see that the no evolving pokemon are also mainly very powerful too and have a lot of variability.

Histogram

I learned that the pokemon hatch counter variable determines how long it takes for pokemon eggs to hatch. Per the doc, “Initial hatch counter: one must walk 255 × (hatch_counter + 1) steps before this Pokemon’s egg hatches, unless utilizing bonuses like Flame Body’s.” I wanted to get an graph of the distribution of hatch count by creating a histogram.

###creating histogram of hatch_counter data 
g <- ggplot(moreAllSpecies, aes( x = hatch_counter))
g + geom_histogram(binwidth=8,color = "brown", fill = "green", 
  size = 1)  + labs(x="Hatch Counter", y="Count",
  title = "Histogram of Pokemon Hatch Counter") 

The histogram of hatch counter appears to be right skewed with most hatch counter values being smaller but a smaller number being larger values.

Histogram Plus Density

After I completed the histogram, I was curious what might explain the higher hatch counter values being so infrequent? To address this question, I created a density overlay by type of pokemon.

###creating histogram of hatch_counter data plus density 
g <- ggplot(moreAllSpecies, aes(y=..density.., x = hatch_counter))
g + geom_histogram(binwidth=8,color = "brown", fill = "green", 
    size = 1)  + labs(x="Hatch Counter", y="Density", title = 
      "Histogram of Pokemon Hatch Counter\nWith Pokemon Category Density",
       fill="Species\nCategories") + 
  geom_density(adjust = 0.5, alpha = 0.5, aes(fill = common), 
    position = "stack")

Adding the density plots did help explain the larger hatch counter values. The density plots show that the larger hatch counter values go with the rare legendary and mythical pokemon. Part of being rare would be that those types would not hatch as often as other types.

Scatter Plot One

Base experience is the number of experience points awarded when a pokemon is defeated. I created a scatter plot of total points (a measure of total power) and base experience to see whether players are rewarded in propoption to the power of a pokemon opponent.

#Setup for a scatter plot of base_experience and total points
corExpPts<-cor(comboSpeciesPoke$base_experience,comboSpeciesPoke$totalPts)

g<-ggplot(data = comboSpeciesPoke,aes(x=totalPts,y=base_experience))
g+geom_point(aes(color=rare)) + 
  geom_smooth(method = lm) +
  geom_text(x=350,y=500,size=5,
        label = paste0("Correlation = ",round(corExpPts, 2))) +
  labs(x="Total Points", y="Base Experience",
       title = "Scatter Plot of Pokemon Base Exerience Versus Total Points",
       color="Species\nCategories")

Although I was not surprised to see a linear relationship, I was surprised to see just how strong the positive correlation was between these variables. This did confirm players are rewarded based in proportion to total power. The other interesting trend in this graph was how there are four almost straight lines with very similar slopes.

Scatter Plot Two

Next I wanted to do a scatter plot for variables that I suspected to have a negative correlation. The capture rate variable measures how hard or easy it is to capture a pokemon with lower values being harder and higher values being easier to catch. I created a scatter plot of total points (a measure of total power) and and capture rate to see whether these would be negatively correlated as I expected.

#Setup for a scatter plot of  total points and capture_rate
corrCapPts<-cor(comboSpeciesPoke$capture_rate,comboSpeciesPoke$totalPts)

g<-ggplot(data = comboSpeciesPoke,aes(x=capture_rate,y=totalPts))
g+geom_point(aes(color=common))  + 
  geom_smooth(method = lm) +
  geom_text(x=50,y=950,size=5,
      label = paste0("Correlation = ",round(corrCapPts, 2))) +
  labs(x="Capture Rate",y="Total Points",color='Species\nCategories',
      title = "Scatter Plot of Pokemon Total Points Versus Capture Rate ") 

From the scatter plot, I did confirm these variables are moderately negatively correlated as I expected. It made sense that more power (higher points) pokemon would be harder to capture (lower capture rates). This trend was specially true for the more common regular pokemon group.

Facet Wrapped Scatter Plot

From the previous scatter plot, I was curious what the scatter plot would show if it were separately drawn for each category. I created a facet wrap version to examine this question.

##
g<-ggplot(data = comboSpeciesPoke,aes(x=capture_rate,y=totalPts))
g+geom_point() + facet_wrap(~common) + 
  labs(x="Capture Rate",y="Total Points",
    color='Species\nCategories',
    title = "Facet Wrapped\nScatter Plots of Pokemon Total Points Versus Capture Rate ") 

The first part of the facet wrap graph that stood out to me is that the regular type most clearly shows the negative correlation. The second part that stood out to me is that the legendary type might actually be positively correlated or not be very correlated at all. The other types did not show clear patterns.