Question

I am looking to calculate the percent match for a string in R. For example:

x <- "asdf"     
y <- "fdjk"     

I would like this to return .5 (e.g. 2 matches, irrespective of order). Any thoughts are greatly appreciated.

Was it helpful?

Solution

You can split up a string into its specific characters with strsplit:

char.x <- strsplit(x, "")[[1]]
char.x
# [1] "a" "s" "d" "f"
char.y <- strsplit(y, "")[[1]]
char.y
# [1] "f" "d" "j" "k"

Now, you can use intersect and length to compute your metric (the exact formula is not clear because your post didn't specify, for instance, how to handle duplicate characters):

length(intersect(char.x, char.y)) /
  max(length(unique(char.x)), length(unique(char.y)))
# [1] 0.5
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top