I am trying to match two sets of words with number of strings. The two sets of words are car and school, and using the stringr package I've set it up to match any instance of a word from either car or school.
library(stringr)
car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Ohio State", "Missouri")
car_match <- str_c(car, collapse = "|")
school_match <- str_c(school, collapse = "|")
df <- data.frame(keyword=c("He drives a Honda",
"He goes to Ohio State",
"He likes Ford and goes to Ohio State"))
df
main <- function(df) {
df$car <- as.numeric(str_detect(df$keyword, car_match))
df$school <- as.numeric(str_detect(df$keyword, school_match))
df
}
main(df)
> main(df)
keyword car school
1 He drives a Honda 1 0
2 He goes to Ohio State 0 1
3 He likes Ford and goes to Ohio State 1 1
Great, that works.
Now, I want to go back and see if I can easily get a count of the frequency for each word within the car and school 'buckets.'
So it should look as follows
Car Freq
Honda 1
Chevy 0
Toyota 0
Ford 1
school Freq
Michigan 0
Ohio State 2
Missouri 0
Because Honda, which is in the car classification, appears once, it has a frequency count of one. Likewise, Ohio State, which is in the school classification and appears twice, has a frequency of two.
Can anyone help me go from classification matching to finding the frequency of each word within the classification?
I could probably go back and set each word in car as it's own str_c and match that way, but I'd like to find a "simpler" route.