Reorder a factor based on the ratio of the group sums of two columns - grouping by the factor to be reordered
Question
I have a data frame, df:
District TypeofSchool Nstudents Nteachers Percent_failure
1 A I 1936 157 21.5
2 A II 67 8 0.5
3 A III 5288 146 78.0
4 B I 653 72 27.8
5 B II 865 22 9.0
6 B III 2278 100 63.2
For graphing using ggplot2, I'd like to recorder the District factor. I'd like to order it by the student to teacher ratio for that district, i.e. sum up the number of students and the number of teachers for all Types of Schools in that district, and take the ratio; reorder the Districts by that ratio, so that the lowest ratio district would show up on the left most position when I plot, say, a stacked bar graph:
ggplot(df, aes(x=District, y=Percent_failure, fill=TypeofSchool)) +
geom_bar(stat="identity")
Any suggestion how to do the reordering?
Solution
Base R solution (using dat
as your data.frame)
stu.tea <- names(sort(by(
dat[c("Nstudents","Nteachers")],dat["District"],
function(x) do.call("/",as.list(colSums(x)))
)))
#[1] "B" "A"
dat$District <- factor(dat$District,levels=stu.tea)
dat$District
#[1] A A A B B B
#Levels: B A
OTHER TIPS
Here is one way to look at it with data.table
require(data.table)
setDT(df)
df[ , ST.RAT := sum(Nstudents)/sum(Nteachers), by = District][order(ST.RAT)]
df[ , District := factor(District,levels=unique(as.character(District)))]
And then do your ggplot
stuff.
with dplyr:
dat = dat %>% group_by(District) %>% mutate(RST=sum(Nstudents/sum(Nteachers))) %>%
arrange(RST)
dat$District = factor(dat$District,levels(dat$District)[unique(dat$District)])
Another dplyr solution:
df <- df %>%
group_by(District) %>%
mutate(RST=sum(Nstudents/sum(Nteachers))) %>%
arrange(RST) %>%
mutate(District = factor(District,District)) # the factor levels are reset here
Note that the last line works by setting the order of the levels of the factor by the current df
order which is set by arrange
.