Reorder a factor based on the ratio of the group sums of two columns - grouping by the factor to be reordered

StackOverflow https://stackoverflow.com//questions/25048970

  •  21-12-2019
  •  | 
  •  

I have a data frame, df:

  District TypeofSchool Nstudents Nteachers Percent_failure
1        A            I      1936       157            21.5
2        A           II        67         8             0.5
3        A          III      5288       146            78.0
4        B            I       653        72            27.8
5        B           II       865        22             9.0
6        B          III      2278       100            63.2

For graphing using ggplot2, I'd like to recorder the District factor. I'd like to order it by the student to teacher ratio for that district, i.e. sum up the number of students and the number of teachers for all Types of Schools in that district, and take the ratio; reorder the Districts by that ratio, so that the lowest ratio district would show up on the left most position when I plot, say, a stacked bar graph:

ggplot(df, aes(x=District, y=Percent_failure, fill=TypeofSchool)) +
  geom_bar(stat="identity")

Any suggestion how to do the reordering?

有帮助吗?

解决方案

Base R solution (using dat as your data.frame)

stu.tea <- names(sort(by( 
             dat[c("Nstudents","Nteachers")],dat["District"],
             function(x) do.call("/",as.list(colSums(x)))
           )))
#[1] "B" "A"

dat$District <- factor(dat$District,levels=stu.tea)
dat$District
#[1] A A A B B B
#Levels: B A

其他提示

Here is one way to look at it with data.table

require(data.table)
setDT(df)

df[ , ST.RAT := sum(Nstudents)/sum(Nteachers), by = District][order(ST.RAT)]
df[ , District := factor(District,levels=unique(as.character(District)))]

And then do your ggplot stuff.

with dplyr:

dat = dat %>% group_by(District) %>% mutate(RST=sum(Nstudents/sum(Nteachers))) %>% 
arrange(RST)

dat$District = factor(dat$District,levels(dat$District)[unique(dat$District)])

Another dplyr solution:

df <- df %>% 
  group_by(District) %>% 
  mutate(RST=sum(Nstudents/sum(Nteachers))) %>% 
  arrange(RST) %>%
  mutate(District = factor(District,District)) # the factor levels are reset here

Note that the last line works by setting the order of the levels of the factor by the current df order which is set by arrange.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top