Plotting Two Factors on the same Graph
Question
Say I have two factors and I want to graph them on the same plot, both factors have the same levels.
s1 <- c(rep("male",20), rep("female", 30))
s2 <- c(rep("male",10), rep("female", 40))
s1 <- factor(s1, levels=c("male", "female"))
s2 <- factor(s2, levels=c("male", "female"))
I would have thought that using the table function would have produced the correct result for graphing but it pops out.
table(s1, s2)
s2
s1 male female
male 10 10
female 0 30
So really two questions, what is the table function doing to get this result and what other function can i use to create a graph with 2 series using functions with the same levels?
Also if it is a factor I'm using barplot2 in the gplots package to graph it.
Solution
You can achieve slightly more detailed results with lattice package:
s1 <- factor(c(rep("male",20), rep("female", 30)))
s2 <- factor(c(rep("male",10), rep("female", 40)))
D <- data.frame(s1, s2)
library(lattice)
histogram(~s1+s2, D, col = c("pink", "lightblue"))
Or if you want males/females side by side for easier comparison:
t1 <- table(s1)
t2 <- table(s2)
barchart(cbind(t1, t2), stack = F, horizontal = F)
OTHER TIPS
From ?table
:
‘table’ uses the cross-classifying factors to build a contingency table of the counts at each combination of factor levels.
When you do table(s1,s2)
what happens is that the function considers s1
and s2
as paired results. Effectively it tells you that if you were to take cbind(s1,s2)
then there would be 10 rows of male-male, 10 of male-female and so on.
To understand this consider a very trivial example:
a <- c("M","M","F","F")
b <- c("F","F","M","M")
table(a,b)
b
a F M
F 0 2
M 2 0
What you should do is:
t1 <- table(s1)
t2 <- table(s2)
barplot(cbind(t1,t2), beside=TRUE, col=c("lightblue", "salmon"))
Two options producing slightly different forms of plots are
plot(s1, s2)
and
plot(table(s1,s2))
The former is a spineplot a special case of the mosaic plot, which the plot
method for table
produces (the second example). See ?spineplot
and ?mosaicplot
for more details and you can use these functions directly, rather than the generic plot()
if you wish.
Also take a look at the mosaic()
function in the vcd
package on CRAN by Meyer et al (Link to vcd on CRAN)
table()
is producing the contingency table for the two factors.
Hmm.. I don't think creating a contingency table is what Cameron was looking for. If I understood him correctly, I think he wanted to create a data frame with two variables in it, where s1 and s2 seems to be vectors of the same size. (length(s1)==length(s2)).
In this case, he would simply need to create a "table" (I think he meant data.frame) using:
df = data.frame(s1=s1, s2=s2);
And then plot the 2 series in the same plot.
So as for the second question of plotting these things, I'd use matplot. For example:
matplot(1:10, data.frame(a=rnorm(10), b=rnorm(10)), type="l", lty=1, lwd=1, col=c("blue","red"))
Given that he has his data of 2 vectors organized in a single data.frame named "df", he can just do something like:
matplot(df, type="l", lty=1, lwd=1, col=c("blue","red"))
Hope this helps.