Вопрос

I have to make a set of selections that vary by the day on this dataset (dat), which is composed by species (sp), day (day, in POSIXct) and area (ar):

sp  day         ar
A   1-Jan-00    2
B   1-Jan-00    6
C   2-Jan-00    2
A   2-Jan-00    1
D   2-Jan-00    4
E   2-Jan-00    12
F   3-Jan-00    8
A   4-Jan-00    3
G   4-Jan-00    2
B   4-Jan-00    1

I need to subset where species "A" occurs. However, the areas to be selected will vary by day, given by this matrix (dat.ar):

day       ar.select
1-Jan-00    (1,6)
2-Jan-00    (1,12)
3-Jan-00    (4,8)
4-Jan-00    (3,12)

More specifically, for areas where species "A" occurs, on 1-jan-00, I need only areas 1 and 6. For 2-jan-00, areas 1 and 12, and so on. As an example, the desired output on this example for this selection is given below:

sp  day        ar
A   2-Jan-00    1
A   4-Jan-00    3

I haven't had much success getting a for loop, as I am still trying to learn the semantics in R. In summary, a rough idea of what must be done, but still struggling with the language. Here is a sketch of where I think this should go:

dat1 = with(dat,sapply(day[sp=="A" & dat.ar$day.s[i] ], 
function(x) ar == (ar[sp=="A" & day == x]==dat.ar$ar.select[j]) 
final=dat[rowSums(dat1) > 0, ]

I believe I have to fit a for loop, that would go through dat.ar, specifying the areas to be selected in dat. But despite my efforts in trying to get for the for loop, I haven't gotten anywhere near. I am not even sure if combining an sapply and a for loop is the right way to go about this. In case someone wishes to reproduce the problem:

sp=c("A","B","C","A","D","E","F","A","G","B")
day=c("1-Jan-00", "1-Jan-00", "2-Jan-00", "2-Jan-00", "2-Jan-00", 
"2-Jan-00", "3-Jan-00", "4-Jan-00", "4-Jan-00", "4-Jan-00")
day=as.POSIXct(day, format="%d-%b-%y")
ar=c(2,6,2,1,4,12,8,3,2,1)
dat= as.data.frame(cbind(sp, day, ar)) 

day.s=c("1-Jan-00", "2-Jan-00", "3-Jan-00", "4-jan-00")
day.s=as.POSIXct(day.s, format="%d-%b-%y")
a.s=c(1,1,4,3)
a.e=c(6,12,8,12)
ar.select=paste(a.s, a.e, sep=",")
dat.ar=cbind(day.s, ar.select)

Any help is much appreciated.

Это было полезно?

Решение

You could merge your table of conditions to the original dataset and filter them conditionally. Consider a1 and a2 like your sp and day values, and obs to be like your ar value.

library(data.table)
dataset <- data.table(
a1 = c("A","B","C","B","A","A","A","A"),
a2 = c("P","Q","Q","Q","R","R","P","Q"),
obs = c(3,2,3,4,2,4,8,0)
)

constraints <- data.table(
a1 = c("A","B","C","A","B","C","A","B","C"),
a2 = c("P","P","P","Q","Q","Q","R","R","R"),
lower = c(1,2,3,4,3,2,3,2,5),
upper = c(6,4,5,7,5,6,5,3,7)
)


checkingdataset <- merge(dataset,constraints, by = c("a1","a2"), all.x = TRUE)

checkingdataset[obs <= upper & obs >= lower, obs.keep := TRUE]


#   a1 a2 obs lower upper obs.keep
#1:  A  P   3     1     6    TRUE
#2:  A  P   8     1     6      NA
#3:  A  Q   0     4     7      NA
#4:  A  R   2     3     5      NA
#5:  A  R   4     3     5    TRUE
#6:  B  Q   2     3     5      NA
#7:  B  Q   4     3     5    TRUE
#8:  C  Q   3     2     6    TRUE

Другие советы

First, I would not use as.data.frame(cbind(...)) to make your data.frames. Second, I would create dat.ar in much the same structure that you have created dat. Third, I would then just use merge to get the result you are looking for.

dat <- data.frame(sp=c("A","B","C","A","D","E","F","A","G","B"),
                  day=c("1-Jan-00", "1-Jan-00", "2-Jan-00", "2-Jan-00", 
                        "2-Jan-00", "2-Jan-00", "3-Jan-00", "4-Jan-00", 
                        "4-Jan-00", "4-Jan-00"),
                  ar=c(2,6,2,1,4,12,8,3,2,1))
dat$day <- as.POSIXct(dat$day, format="%d-%b-%y")

day.s <- c("1-Jan-00", "2-Jan-00", "3-Jan-00", "4-jan-00")
day.s <- as.POSIXct(day.s, format="%d-%b-%y")
a.s <- c(1,1,4,3)
a.e <- c(6,12,8,12)
ar.select <- paste(a.s, a.e, sep=",")
dat.ar <- data.frame(sp = "A", day = day.s, ar = ar.select)

dat.ar <- cbind(dat.ar[-3], 
                read.csv(text = as.character(dat.ar$ar), header = FALSE))
library(reshape2)
dat.ar <- melt(dat.ar, id.vars=1:2, value.name="ar")
dat.ar
#   sp        day variable ar
# 1  A 2000-01-01       V1  1
# 2  A 2000-01-02       V1  1
# 3  A 2000-01-03       V1  4
# 4  A 2000-01-04       V1  3
# 5  A 2000-01-01       V2  6
# 6  A 2000-01-02       V2 12
# 7  A 2000-01-03       V2  8
# 8  A 2000-01-04       V2 12

merge(dat, dat.ar)
#   sp        day ar variable
# 1  A 2000-01-02  1       V1
# 2  A 2000-01-04  3       V1

Of course, I would just suggest that you make your dat.ar object in a more friendly manner to begin with. Why paste values together if you are going to separate them out later anyway? ;)

dat.ar <- data.frame(sp = "A", 
                     day = c("1-Jan-00", "2-Jan-00", "3-Jan-00", "4-jan-00"),
                     a.s = c(1,1,4,3), a.e = c(6,12,8,12))
dat.ar$day <- as.POSIXct(dat.ar$day, format="%d-%b-%y")

library(reshape2)
dat.ar <- melt(dat.ar, id.vars=1:2, value.name="ar")
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top