The 40% of Bender
Bender’s Composition
Every once in awhile bender proclaims he is ‘40 percent’ of something. What has he said he is made out of:
Benders Composition |
---|
im 40% zinc |
my bodys 40% titanium im finally richer than those snooty atm machines |
im 40% dolomite oh its hot its very hot |
oh no im 40% lucky the scrap metal im made from included a truckload of horseshoes from the luckiest racehorses in mexico who had just been sent to a glue factory |
im 40% scrap metal |
im 40% wire |
im 40% empty also what the hell is a free will slot |
Code
This was based off of previous Futurama: Benders Top 10 Words.
library(httr)
library(XML)
library(kableExtra)
base_site <- "http://theinfosphere.org"
## Get Links
# XPATH
linkXPath <- "//*[contains(concat( ' ', @class, ' ' ), concat( ' ', 'oLeft', ' ' ))]//a/@href"
# Path to transcript listings
ts_path <- "Episode_Transcript_Listing"
# Get it
pageL <- GET(base_site, path=ts_path)
# Convert to HTML
h <- htmlParse(pageL)
# Provided by SelectorGadget... /@href was added as only want the link part
hLinks <- getNodeSet(h, linkXPath)
# Convert to character
hLinks <- as.character(hLinks)
# This function will be used to clean up each episode
getBenDialog <- function(diag) {
diag <- diag[grepl("Bender: ", diag)]
# From end of "Bender: " to character before of \n
found <- regexpr("(?<=Bender: ).*?(?=\\\n)", diag, perl=T)
diag <- ifelse(found == -1, NA, regmatches(diag, found))
# Remove anything between square brackets [], regex "\\[[^\\]]*\\]"
diag <- gsub("\\[[^\\]]*\\]", "", diag, perl=T)
diag <- gsub("[^[:alnum:][:space:]%]", "", diag) # Remove punctuation
diag <- gsub("\\s+", " ", diag) # Remove white space
diag <- gsub("^\\s+|\\s+$", "", diag) # Remove leading/trailing whitespace
diag <- tolower(diag)
return(diag)
}
# Output
benDialog <- NULL
# Loop over each episode and get data
for (k in 1:length(hLinks)) {
# Get episode
pageT <- GET(paste0(base_site, hLinks[k]))
h <- htmlParse(pageT, asText=TRUE)
# XPaths
diagXPath <- "//p"
diag <- xpathSApply(h, diagXPath, xmlValue)
# Process Episode
benDiag <- getBenDialog(diag)
# Remove words, add it to vector
benDialog <- c(benDialog, benDiag)
# Be nice
Sys.sleep(1)
}
benDialog <- benDialog[!is.na(benDialog)]
benDialog <- unique(benDialog)
knitr::kable(data.frame(benderComposition=benDialog[grepl("40%",benDialog)])) %>% kable_styling(bootstrap_options = c("striped"))