How to match two string vectors if the strings case is different in both the vectors in R?

https://www.tutorialspoint.com/how-to-match-two-string-vectors-if-the-strings-case-is-different-in-both-the-vectors-in-r

10-09-2020
|

Pregunta

How to match two string vectors if the strings case is different in both the vectors in R?

R Programming Server Side Programming Programming

We know that, R is a case sensitive programming language, hence matching strings of different case is not simple. For example, if a vector contains tutorialspoint and the other contains TUTORIALSPOINT then to check whether the strings match or not, we cannot use match function directly. To do this, we have to convert the lowercase string to uppercase or uppercase to lowercase with the match function.

Examples

Live Demo

> x1<-sample(letters[1:26],100,replace=TRUE)
> x1

Output

[1] "z" "v" "r" "y" "z" "l" "v" "t" "f" "p" "p" "z" "e" "b" "a" "o" "m" "d"
[19] "e" "l" "y" "y" "u" "u" "w" "b" "a" "j" "n" "v" "b" "q" "b" "d" "l" "a"
[37] "g" "g" "g" "o" "k" "r" "q" "e" "x" "i" "r" "l" "b" "r" "j" "k" "b" "f"
[55] "r" "f" "r" "n" "y" "y" "l" "k" "y" "s" "b" "a" "s" "f" "a" "l" "j" "i"
[73] "q" "o" "t" "v" "t" "r" "i" "x" "s" "q" "h" "t" "y" "k" "a" "h" "e" "m"
[91] "u" "d" "q" "i" "h" "x" "k" "j" "p" "h"

Example

Live Demo

> x2<-sample(LETTERS[1:10],100,replace=TRUE)
> x2

Output

[1] "E" "C" "F" "F" "A" "H" "E" "F" "D" "F" "J" "G" "G" "D" "E" "G" "G" "F"
[19] "A" "C" "C" "H" "E" "G" "H" "A" "B" "A" "H" "G" "D" "J" "G" "C" "D" "I"
[37] "F" "B" "D" "D" "C" "D" "E" "D" "B" "E" "E" "H" "D" "D" "I" "B" "I" "J"
[55] "C" "C" "H" "D" "B" "D" "F" "F" "D" "F" "E" "B" "F" "J" "D" "B" "G" "J"
[73] "G" "C" "E" "A" "I" "B" "D" "A" "G" "G" "F" "D" "E" "E" "G" "I" "D" "D"
[91] "I" "E" "J" "D" "E" "B" "C" "A" "I" "C"

> match(x1,tolower(x2))

Output

[1] NA NA NA NA NA NA NA NA 3 NA NA NA 1 27 5 NA NA 9 1 NA NA NA
NA NA NA
[26] 27 5 11 NA NA 27 NA 27 9 NA 5 12 12 12 NA NA NA NA 1 NA 36 NA NA 27
NA
[51] 11 NA 27 3 NA 3 NA NA NA NA NA NA NA NA 27 5 NA 3 5 NA 11 36 NA
NA NA
[76] NA NA NA 36 NA NA NA 6 NA NA NA 5 6 1 NA NA 9 NA 36 6 NA NA 11
NA 6

> x3<-c("AK", "AL", "AR", "AS", "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA",
"GU", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", "ME", "MI",
"MN", "MO", "MP", "MS", "MT", "NC", "ND", "NE", "NH", "NJ", "NM", "NV", "NY",
"OH", "OK", "OR", "PA", "PR", "RI", "SC", "SD", "TN", "TX", "UM", "UT", "VA",
"VI", "VT", "WA", "WI", "WV", "WY")
> x3
[1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI"
"IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MP" "MS"
"MT"
[31] "NC" "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI"
"SC"
[46] "SD" "TN" "TX" "UM" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"

> x4<-c("ak", "al", "ar", "as", "az", "ca", "co", "ct", "dc", "de", "fl", "ga", "gu", "hi", "ia",
"id", "il", "in", "ks", "ky", "la", "ma", "md", "me", "mi", "mn", "mo", "mp", "ms", "mt",
"nc", "nd", "ne", "nh", "nj", "nm", "nv", "ny", "oh", "ok", "or", "pa", "pr", "ri", "sc", "sd",
"tn", "tx", "um", "ut", "va", "vi", "vt", "wa", "wi", "wv", "wy")
> x4
[1] "ak" "al" "ar" "as" "az" "ca" "co" "ct" "dc" "de" "fl" "ga" "gu" "hi" "ia"
[16] "id" "il" "in" "ks" "ky" "la" "ma" "md" "me" "mi" "mn" "mo" "mp" "ms" "mt"
[31] "nc" "nd" "ne" "nh" "nj" "nm" "nv" "ny" "oh" "ok" "or" "pa" "pr" "ri" "sc"
[46] "sd" "tn" "tx" "um" "ut" "va" "vi" "vt" "wa" "wi" "wv" "wy"
> length(x4)
[1] 57
> length(x3)
[1] 57

> match(x3,toupper(x4))

Output

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57

> match(LETTERS[1:20],toupper(c(letters[1:26])))

Output

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

> match(LETTERS[1:20],toupper(c(letters[1:10])))

Output

[1] 1 2 3 4 5 6 7 8 9 10 NA NA NA NA NA NA NA NA NA NA

> match(LETTERS[10:1],toupper(c(letters[1:10])))

Output

[1] 10 9 8 7 6 5 4 3 2 1

> match(sample(LETTERS[10:1],50,replace=TRUE),toupper(c(letters[1:10])))

Output

[1] 2 10 8 7 10 8 4 8 2 7 10 3 5 5 4 6 4 10 7 3 7 1 1 3 10
[26] 3 7 4 3 7 5 4 5 2 7 4 5 1 5 7 2 4 4 2 8 8 9 5 1 2

> match(sample(letters[26:1],50,replace=TRUE),tolower(c(LETTERS[1:20])))

Output

[1] 8 2 18 15 15 14 11 13 18 5 9 13 14 20 18 15 4 14 5 NA NA 5 NA 8 17
[26] 5 16 3 4 9 NA 5 16 17 16 6 12 1 2 NA NA 8 16 9 NA 14 NA 11 16 15

> match(sample(c("india","russia","china","uk"),50,replace=TRUE),tolower(c("INDIA","R
USSIA","CHINA")))

Output

[1] NA 3 2 1 3 1 NA NA NA 2 NA 3 3 3 1 NA NA 3 3 3 2 3 2 3 2
[26] 3 3 NA 3 3 2 NA 3 1 NA 3 NA 3 1 NA 3 NA NA NA NA 3 NA 2 NA NA

Nizamuddin Siddiqui

Published on 04-Sep-2020 14:53:05

Previous Page Print Page