Question

In R, I have two character vectors, a and b.

a <- c("abcdefg", "hijklmnop", "qrstuvwxyz")
b <- c("abXdeXg", "hiXklXnoX", "Xrstuvwxyz")

I want a function that counts the character mismatches between each element of a and the corresponding element of b. Using the example above, such a function should return c(2,3,1). There is no need to align the strings. I need to compare each pair of strings character-by-character and count matches and/or mismatches in each pair. Does any such function exist in R?

Or, to ask the question in another way, is there a function to give me the edit distance between two strings, where the only allowed operation is substitution (ignore insertions or deletions)?

Was it helpful?

Solution

Using some mapply fun:

mapply(function(x,y) sum(x!=y),strsplit(a,""),strsplit(b,""))
#[1] 2 3 1

OTHER TIPS

Another option is to use adist which Compute the approximate string distance between character vectors:

mapply(adist,a,b)
abcdefg  hijklmnop qrstuvwxyz 
     2          3          1 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top