When you use agrep
with value = FALSE
the result is "a vector giving the indices of the elements that yielded a match". That is, the position of matches in the vector of names that you fed agrep
with. You then try to replace the entire name variable in your data frame (67424 rows) with a shorter vector of indices (3074 of them). Not what you want. Here is a small example which perhaps can guide you in the right direction. You may also read ?Extract
and this. The details of agrep
itself (e.g. max.distance
), I leave to you.
# create a data frame with some MC DONALD's-ish names, and some other names.
rest2012 <- data.frame(CONAME = c("MC DONALD'S", "MCC DONALD'S", "SPSS Café", "GLM RONALDO'S", "MCMCglmm"))
rest2012
# do some fuzzy matching with 'agrep'
# store the indices in an object named 'idx'
idx <- agrep(pattern = "MC DONALD'S", x = rest2012$CONAME, ignore.case = FALSE, value = FALSE, max.distance = 3)
idx
# just look at the rows in the data frame that matched
# indexing with a numeric vector
rest2012[idx, ]
# replace the elements that matches
rest2012[idx, ] <- "MC DONALD'S"
rest2012