Question

I'm a new R user and I am trying to chart an interaction between 2 continuous variables and a categorical variable.

Using interaction.plot:

interaction.plot(nonconform, trans, employdisc, type="b", col=(1:3) ,
             leg.bty="o", leg.bg="beige", lwd=2, pch=c(18,24,22),
             xlab="Nonconformity",
             ylab="Discrimination",
             main="Interaction Plot")

I get this result:

interaction plot

When I attempt to do the same thing with ggplot

ggplot(data=NTDS.zip, aes(x=nonconform, y=employdisc, colour = factor(trans), group=trans, )) + 
            stat_summary(fun.y=mean, geom="point") + 
            stat_summary(fun.y=mean, geom="line")

I get this result:

ggplot chart

There is an extra line (in grey that I can't get rid off). Its likely representing missing data, but haven't found a way to remove that line from the chart. Any discussion I found talked about suppressing warning due to missing data, but nothing regarding extra lines in a chart.

Any thoughts?

Update

After reading the R Graphics Cookbook I tried another method.

THe book's method involved summarizing the data first.

tg <- ddply(ntds.new, c("trans", "nonconform"), summarize, empdisc=mean(employdisc))

and then plotting the chart.

I tried 2 types (colour and linetype)

ggplot(tg, aes(x=nonconform, y=empdisc, colour=trans))+geom_line() 
ggplot(tg, aes(x=nonconform, y=empdisc, linetype=trans))+geom_line()

The plot with the colour statement has the extra line, while the plot with linetype does not.

the data for this was:

trans   nonconform  empdisc
1   1   0   1.104046
2   1   1   1.472050
3   1   2   1.930070
4   1   3   2.247706
5   1   4   3.407407
6   1   NA  7.250000
7   2   0   3.427230
8   2   1   3.929707
9   2   2   4.062275
10  2   3   4.373853
11  2   4   4.470149
12  2   NA  5.294118
13  3   0   1.309524
14  3   1   1.968310
15  3   2   2.366589
16  3   3   3.815000
17  3   4   3.560606
18  3   NA  6.000000
19  4   0   2.661290
20  4   1   3.208861
21  4   2   3.033195
22  4   3   3.322176
23  4   4   3.755906
24  4   NA  6.625000
25  NA  0   4.000000
26  NA  1   4.166667
27  NA  2   2.500000
28  NA  3   6.666667
29  NA  4   5.400000
30  NA  NA  2.000000

I went back and deleted the (10) lines with missing cases for either trans or nonconform columns.

trans   nonconform  empdisc
1   1   0   1.104046
2   1   1   1.472050
3   1   2   1.930070
4   1   3   2.247706
5   1   4   3.407407
6   2   0   3.427230
7   2   1   3.929707
8   2   2   4.062275
9   2   3   4.373853
10  2   4   4.470149
11  3   0   1.309524
12  3   1   1.968310
13  3   2   2.366589
14  3   3   3.815000
15  3   4   3.560606
16  4   0   2.661290
17  4   1   3.208861
18  4   2   3.033195
19  4   3   3.322176
20  4   4   3.755906

This solved my initial problem but this solution seems more complicated than it should be, and I'm curious as to why the plot with "colour" was affected and the one with "linetype" wasn't.

Was it helpful?

Solution

If we look in your data in table tg then there are NA values for the variable trans.

When you use trans (as factor) for the colors of the lines those NA values are also plotted because for color scales default action for NA levels is to plot them in grey50 color (na.value="grey50"). But for the linetype scales default action for NA levels is to plot blank line (na.value="blank") so you don't see the line.

To solve the problem there are couple of solutions. First, you can add the scale_color_discrete() and set the na.value= to NA.

ggplot(tg, aes(x=nonconform, y=empdisc, colour=as.factor(trans)))+
  geom_line()+
  scale_color_discrete(na.value=NA)

Another solution is to subset your data to remove NA values from your data and then plot your data. This can be done also inside the ggplot() call.

ggplot(tg[complete.cases(tg),], aes(x=nonconform, y=empdisc, colour=as.factor(trans)))+
  geom_line()

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top