First, adjusted your original sample data to contain more than one unique GenePosition
.
dput(seq.long)
structure(list(ID = 1:8, Var1 = structure(c(2L, 2L, 1L, 1L, 2L,
2L, 1L, 1L), .Label = c("case", "control"), class = "factor"),
GenePosition = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
), .Label = c("X20068492", "X20068493"), class = "factor"),
ContinuousOutcomeVar = c(0.092813611, 0.001746708, 0.069251157,
0.003639304, 0.112813611, 0.002746708, 0.089251157, 0.004639304
)), .Names = c("ID", "Var1", "GenePosition", "ContinuousOutcomeVar"
), class = "data.frame", row.names = c(NA, -8L))
If you just want to represent one value for each GenePosition
and Var1
combination then it would be easier to calculate mean values before plotting. That can be achieved with function ddply()
from library plyr
.
library(plyr)
seq.long.sum<-ddply(seq.long,.(Var1,GenePosition),
summarize, value = mean(ContinuousOutcomeVar))
seq.long.sum
Var1 GenePosition value
1 case X20068492 0.03644523
2 case X20068493 0.04694523
3 control X20068492 0.04728016
4 control X20068493 0.05778016
Now with this new data frame you just have to give x
and y
values. Var1
should be used in colour=
and group=
to ensure that each group has different color and that lines are connected.
ggplot(seq.long.sum,aes(x=GenePosition,y=value,colour=Var1,group=Var1))+
geom_point()+geom_line()