sum columns in a data.frame based on conditions from another dataframe in R

StackOverflow https://stackoverflow.com/questions/22773657

  •  25-06-2023
  •  | 
  •  

Pregunta

I have two data frames, a and b.

For each row in b, I want to find all start,end in a that are within the start,end of b, and then sum differences of start,end of this particular subset of a, and store it as a new column in b. I'm using a for loop but is there a more efficient way to do this with apply in R?

# data.frame a  
a <- data.frame(chrom=1L, start=as.integer(c(2,4,7,11)), end=as.integer(c(3,6,9,15)))
# chrom start end  
#     1     2   3  
#     1     4   6  
#     1     7   9        
#     1    11  15  

# data.frame b  
b <- data.frame(chr=1L, start=as.integer(c(2,11)), end=as.integer(c(10,20)))
# chrom start end  
#     1     2  10  
#     1    11  20  

# code
result=c()
for (i in 1:dim(b)[1]) { 
    # find start,end in A that are within    
    a_subset = a[which(a$chrom == b[i, ]$chrom & 
                 a$start >= b[i, ]$start & 
                 a$end <= b[i, ]$end), ]

    result = append(result, sum(a_subset$end - a_subset$start))  
}
c = cbind(b, result)

# data.frame c
# chrom start end result
#     1     2  10      5
#     1    11  20      4
¿Fue útil?

Solución

Easy with sqldf, annoying with base R:

R>require(sqldf)
R>b$id <- 1:nrow(b)
R>sqldf("select id, b.chr, sum(a.end - a.start) as diff 
    from a, b where a.start >= b.start and b.end >= a.end group by id")
  id chr diff
1  1   1    5
2  2   1    4
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top