Reorder a factor based on the ratio of the group sums of two columns - grouping by the factor to be reordered

StackOverflow https://stackoverflow.com//questions/25048970

  •  21-12-2019
  •  | 
  •  

Question

I have a data frame, df:

  District TypeofSchool Nstudents Nteachers Percent_failure
1        A            I      1936       157            21.5
2        A           II        67         8             0.5
3        A          III      5288       146            78.0
4        B            I       653        72            27.8
5        B           II       865        22             9.0
6        B          III      2278       100            63.2

For graphing using ggplot2, I'd like to recorder the District factor. I'd like to order it by the student to teacher ratio for that district, i.e. sum up the number of students and the number of teachers for all Types of Schools in that district, and take the ratio; reorder the Districts by that ratio, so that the lowest ratio district would show up on the left most position when I plot, say, a stacked bar graph:

ggplot(df, aes(x=District, y=Percent_failure, fill=TypeofSchool)) +
  geom_bar(stat="identity")

Any suggestion how to do the reordering?

Was it helpful?

Solution

Base R solution (using dat as your data.frame)

stu.tea <- names(sort(by( 
             dat[c("Nstudents","Nteachers")],dat["District"],
             function(x) do.call("/",as.list(colSums(x)))
           )))
#[1] "B" "A"

dat$District <- factor(dat$District,levels=stu.tea)
dat$District
#[1] A A A B B B
#Levels: B A

OTHER TIPS

Here is one way to look at it with data.table

require(data.table)
setDT(df)

df[ , ST.RAT := sum(Nstudents)/sum(Nteachers), by = District][order(ST.RAT)]
df[ , District := factor(District,levels=unique(as.character(District)))]

And then do your ggplot stuff.

with dplyr:

dat = dat %>% group_by(District) %>% mutate(RST=sum(Nstudents/sum(Nteachers))) %>% 
arrange(RST)

dat$District = factor(dat$District,levels(dat$District)[unique(dat$District)])

Another dplyr solution:

df <- df %>% 
  group_by(District) %>% 
  mutate(RST=sum(Nstudents/sum(Nteachers))) %>% 
  arrange(RST) %>%
  mutate(District = factor(District,District)) # the factor levels are reset here

Note that the last line works by setting the order of the levels of the factor by the current df order which is set by arrange.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top