Question

I want the index of some variables in a data-frame, but my grep() skill are insufficient.

Say I have this data frame,

( dfn <- data.frame(
a1   = c(3,  3, 0,  3, 0,   0),
a2   = c(1, NA, 0, NA, 1,   4),
a11  = c(0,  3, NA, 1, 3,   1),
a12  = c(0,  3, NA, 1, 3,   3),
a_12 = c(0,  3, NA, 1, NA, NA),
a_1  = c(12, 3, NA, 1, 4,  NA)) )
  a1 a2 a11 a12 a_12 a_1
1  3  1   0   0    0  12
2  3 NA   3   3    3   3
3  0  0  NA  NA   NA  NA
4  3 NA   1   1    1   1
5  0  1   3   3   NA   4
6  0  4   1   3   NA  NA

Now, what I want is to grep a1, a2, a11, and a12 (in real life the # after the a' is a consecutive list from 1 to 12), how do I do that? I've tried the two grep's below, but with no luck.

foo <- grep('a[1:12]$', names(dfn) )
names(dfn[,foo])
[1] "a1" "a2"

I've also tried this,

bar <- grep('a[c(1:12)]$', names(dfn) )
names(dfn[,bar])
[1] "a1" "a2"

What I want is

[1] "a1" "a2" "a11" "a12"

Secondly, can anyone direct me to a good grep() tutorial? Thanks!

Was it helpful?

Solution

you need grep('a[1:12]+', names(dfn))

actually the proper way to do it would be grep('a[1-9]+', names(dfn)) the + after the [1-9] means that values from 1-9 can be repeated any number of times after the a but must appear at least once.

OTHER TIPS

regmatches(names(dfn),regexpr('a[1-9]{1,2}',names(dfn)))
[1] "a1"  "a2"  "a11" "a12"

my regular expression is : a follwed by min =1 and max =2 numbers in the set [1-9]

You could just do this instead:

names(dfn)[names(dfn) %in% paste0("a",1:12)]
[1] "a1"  "a2"  "a11" "a12"

If you want the indexes, this will give you that:

which(names(dfn) %in% paste0("a",1:12))
[1] 1 2 3 4
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top