Question

I have a dataframe in R like this:

dat = data.frame(Sample = c(1,1,2,2,3), Start = c(100,300,150,200,160), Stop = c(180,320,190,220,170))

And I would like to plot it such that the x-axis is the position and the y-axis is the number of samples at that position, with each sample in a different colour. So in the above example you would have some positions with height 1, some with height 2 and one area with height 3. The aim being to find regions where there are a large number of samples and what samples are in that region.

i.e. something like:

      &
     ---
********-  --       **

where * = Sample 1, - = Sample 2 and & = Sample 3

Était-ce utile?

La solution

This hack may be what you're looking for, however I've greatly increased the size of the dataframe in order to take advantage of stacking by geom_histogram.

library(ggplot2)
dat = data.frame(Sample = c(1,1,2,2,3), 
                 Start = c(100,300,150,200,160), 
                 Stop = c(180,320,190,220,170))

# Reformat the data for plotting with geom_histogram.
dat2 = matrix(ncol=2, nrow=0, dimnames=list(NULL, c("Sample", "Position")))

for (i in seq(nrow(dat))) {
    Position = seq(dat[i, "Start"], dat[i, "Stop"])
    Sample = rep(dat[i, "Sample"], length(Position))
    dat2 = rbind(dat2, cbind(Sample, Position))
}

dat2 = as.data.frame(dat2)
dat2$Sample = factor(dat2$Sample)

plot_1 = ggplot(dat2, aes(x=Position, fill=Sample)) +
         theme_bw() +
         opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) +
         geom_hline(yintercept=seq(0, 20), colour="grey80", size=0.15) +
         geom_hline(yintercept=3, linetype=2) +
         geom_histogram(binwidth=1) +
         ylim(c(0, 20)) +
         ylab("Count") +
         opts(axis.title.x=theme_text(size=11, vjust=0.5)) +
         opts(axis.title.y=theme_text(size=11, angle=90)) +
         opts(title="Segment Plot")

png("plot_1.png", height=200, width=650)
print(plot_1)
dev.off()

Note that the way I've reformatted the dataframe is a bit ugly, and will not scale well (e.g. if you have millions of segments and/or large start and stop positions).

enter image description here

Autres conseils

My first try:

dat$Sample = factor(dat$Sample)
ggplot(aes(x = Start, y = Sample, xend = Stop, yend = Sample, color = Sample), data = dat) + 
  geom_segment(size = 2) + 
  geom_segment(aes(x = Start, y = 0, xend = Stop, yend = 0), size = 2, alpha = 0.2, color = "black")

enter image description here

I combine two segment geometries here. One draws the colored vertical bars. These show where Samples have been measured. The second geometry draws the grey bar below where the density of the samples is shown. Any comments to improve on this quick hack?

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top