Question

I would like to select specific rows in a dataframe when I get a value in some row. These selected lines (plus initial selected line) must compose a new dataframe and the dataframe name must be = $Name in initial selected line.

The logic:

1 - The initial selected lines must have $FC=> 0.7.

2 - The selected lines to form a dataframe must $chr = to initial selected line.

3 - The selected lines must have $Position inside 5000 window (comparing with $Position in initial selected line).

*3a) In this example, the line $Name= BD22 cannot be included in BD13 dataframe because $Position is out of window number (5000 window to 3000 vary since $Position = 500 until $Position = 5500)

Follows above a simplified example:

My input input dataframe:

 Name   FC   chr   Position 
 BD10   0.1  chr1    1000
 BD11   0.1  chr2    1000
 BD12   0.2  chr3    2000
 BD13   0.7  chr3    3000
 BD14   0.4  chr3    4000
 BD22   0.1  chr3    7000
 BD23   0.2  chr4    1000

I expect a dataframe with name line as output, in this example = BD13:

Name   FC   chr   Position
BD12   0.2  chr3   2000
BD13   0.7  chr3   3000
BD14   0.4  chr3   4000

After, I would like to plot each composed dataframe like this:

pdf(BD13.pdf)
plot(BD13$Name, BD13$FC, main="BD13",
   xlab="Name", ylab="FC")
dev.off()

I have tried:

out <- subset(input, FC >= 0.7)
out$startw <- (out$Position - 2500)
out$endw <- (out$Position + 2500)


library(plyr)
lvl <- dlply(out, .(Name))

for (i in 1:length(lvl)) {
  Neigh1 <- subset(input, input$Position >= lvl[i]$startw & lvl[i]$chr == input$chr)
  Neigh2 <- subset(input, input$Position <= lvl[i]$endw & lvl[i]$chr == input$chr)
  Neight <- rbind(Neigh1, Neigh2)

pdf(sprintf("%s.pdf", [i]))
boxplot(Neigh$Name, Neigh$FC, xlab=[i], ylab="FC", main="[i]")
dev.off()}

But Neigh1 and Neigh2 are empty... Thank you!

Was it helpful?

Solution

Unless you really want to, it's a bad idea to create all these new varaibles based on the elements in input$name because:

  • if the input$name contains a name such as 'input' that clashes with another variable you can get bugs that are hard to track down

  • You potentially clutter up your workspace with many variables

  • It's hard(er) to loop over the variables to plot them without using esoteric bits of R code, or copying and pasting lots of code.

I suggest creating a list as follows:

rows <- 1:nrow(input)
res <- lapply(which(input$FC>=0.7),function(x) {
           x2 <- rows 
           x2<- input$chr[x2] == input$chr[x] & abs(input$Position[x2] - input$Position[x]) < 2500
          input[x2,]})
names(res) <- input$Name[input$FC>=0.7]  ##corrected this line

where each element of the list is one of the variables that you wanted to create. Access as res[["BD13"]] or res[[1]] - the latter form will make it easy to produce all your plots in a loop.

Edit:

To plot, I think that you want the following (can't test at the moment):

for (i in 1:length(res)) {   
  pdf(sprintf("%s.pdf", names(res)[i]))   
  boxplot(res[[i]]$Name, res[[i]]$FC, xlab=res[[i]]$Name, ylab="FC", main=names(res)[i])   
  dev.off()
} 

but check the arguments to boxplot - I don't think the first one should be the text

res[i] is a list (of length 1) containing the ith element of res, whereas res[[i]] is the ith element itself.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top