Pregunta

I have two dataframes

#df1
type <- c("A", "B", "C")
day_start <- c(5,8,4)
day_end <- c(12,10,11)
df1 <- cbind.data.frame(type, day_start, day_end)
df1
  type day_start day_end
1    A         5      12
2    B         8      10
3    C         4      11

#df2
value <- 1:10
day <- 4:13
df2 <- cbind.data.frame(day, value)
   day value
1    4     1
2    5     2
3    6     3
4    7     4
5    8     5
6    9     6
7   10     7
8   11     8
9   12     9
10  13    10

I would like to subset df2 such that each level of factor "type" in df1 gets its own dataframe, only including the rows/days between day_start and day_end of this factor level.

Desired outcome for "A" would be..

list_of_dataframes$df_A
   day value
1    5     2
2    6     3
3    7     4
4    8     5
5    9     6
6   10     7
7   11     8
8   12     9

I found this question on SO with the answer suggesting to use mapply(), however, I just cannot figure out how I have to adapt the code given there to fit my data and desired outcome.. Can someone help me out?

¿Fue útil?

Solución

The following solution assumes that you have all integer values for days, but if that assumption is plausible, it's an easy one-liner:

> apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],])
[[1]]
  day value
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8
9  12     9

[[2]]
  day value
5   8     5
6   9     6
7  10     7

[[3]]
  day value
1   4     1
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8

You can use setNames to name the dataframes in the list:

setNames(apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],]),df1[,1])

Otros consejos

Yes, you can use mapply:

Define a function that will do what you want:

fun <- function(x,y) df2[df2$day >= x & df2$day <= y,]

Then use mapply to apply this function with every element of day_start and day_end:

final.output <- mapply(fun,df1$day_start, df1$day_end, SIMPLIFY=FALSE)

This will give you a list with the outputs you want:

final.output

[[1]]
  day value
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8
9  12     9

[[2]]
  day value
5   8     5
6   9     6
7  10     7

[[3]]
  day value
1   4     1
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8

You can name each data.frameof the list with setNames:

final.output <- setNames(final.output,df1$type)

Or you can also put an attribute type on the data.frames of the list:

fun <- function(x,y, type){
  df <- df2[df2$day >= x & df2$day <= y,]
  attr(df, "type") <- as.character(type)
  df
}

Then each data.frame of final.output will have an attribute so you know which type it is:

final.output <- mapply(fun,df1$day_start, df1$day_end,df1$type, SIMPLIFY=FALSE)

# check wich type the first data.frame is 
attr(final.output[[1]], "type")
[1] "A"

Finally, if you do not want a list with the 3 data.frames you can create a function that assigns the 3 data.frames to the global environment:

fun <- function(x,y, type){
  df <- df2[df2$day >= x & df2$day <= y,]
  name <- as.character(type)
  assign(name, df, pos=.GlobalEnv)
}

mapply(fun,df1$day_start, df1$day_end, type=df1$type, SIMPLIFY=FALSE)

This will create 3 separate data.frames in the global environment named A, B and C.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top