Frage

Supose I have a data frame with 3 columns (name, y, sex) where name is character, y is a numeric value and sex is a factor.

sex<-c("M","M","F","M","F","M","M","M","F")
x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET")
name<-as.character(x)
y<-rnorm(9,8,1)
score<-data.frame(x,y,sex)
score
     name      y     sex
1    MARK  6.767086   M
2     TOM  7.613928   M
3   SUSAN  7.447405   F
4   LARRY  8.040069   M
5    EMMA  8.306875   F
6 LEONARD  8.697268   M
7     TIM 10.385221   M
8    MATT  7.497702   M
9  VIOLET 10.177969   F

If I wanted to order it by y I would use:

score[order(score$y),]
        x         y sex
1    MARK  6.767086   M
3   SUSAN  7.447405   F
8    MATT  7.497702   M
2     TOM  7.613928   M
4   LARRY  8.040069   M
5    EMMA  8.306875   F
6 LEONARD  8.697268   M
9  VIOLET 10.177969   F
7     TIM 10.385221   M

So far, so good... The names keep the correct score BUT how could I reorder it to have M and F levels not mixed. I need to order and at the same time keep factor levels separated.

Finally I would like to take a step further to involve character, the example doesn't help, but what if there were tied y values and I would have to order again within factor (e.g. TIM and TOM got 8.4 and I have to assign alphabetical order).

I was thinking about by function but it creates a list and doesn't help really. I think there must be some function like it to apply on data frames and get data frames as return.

TO MAKE CLEAR THE POINT:

sep<-split(score,score$sex)
sep$M<-sep$M[order(sep$M[,2]),]
sep$M
x         y sex
1    MARK  6.767086   M
8    MATT  7.497702   M
2     TOM  7.613928   M
4   LARRY  8.040069   M
6 LEONARD  8.697268   M
7     TIM 10.385221   M

sep$F<-sep$F[order(sep$F[,2]),]
sep$F
x         y sex
3  SUSAN  7.447405   F
5   EMMA  8.306875   F
9 VIOLET 10.177969   F

merged<-rbind(sep$M,sep$F)
merged
x         y sex
1    MARK  6.767086   M
8    MATT  7.497702   M
2     TOM  7.613928   M
4   LARRY  8.040069   M
6 LEONARD  8.697268   M
7     TIM 10.385221   M
3   SUSAN  7.447405   F
5    EMMA  8.306875   F
9  VIOLET 10.177969   F

I know how to do that if I have 2 or 3 factors. But what if I had serious levels of factors, say 20, should I write a for loop?

War es hilfreich?

Lösung

order takes multiple arguments, and it does just what you want:

with(score, score[order(sex, y, x),])
##         x        y sex
## 3   SUSAN 6.636370   F
## 5    EMMA 6.873445   F
## 9  VIOLET 8.539329   F
## 6 LEONARD 6.082038   M
## 2     TOM 7.812380   M
## 8    MATT 8.248374   M
## 4   LARRY 8.424665   M
## 7     TIM 8.754023   M
## 1    MARK 8.956372   M

Andere Tipps

Here is a summary of all methods mentioned in other answers/comments (to serve future searchers). I've added a data.table way of sorting.

# Base R
do.call(rbind, by(score, score$sex, function(x) x[order(x$y),]))
with(score, score[order(sex, y, x),])
score[order(score$sex,score$x),]

# Using plyr
arrange(score, sex,y)
ddply(score, c('sex', 'y'))

# Using `data.table`
library("data.table")
score_dt <- setDT(score)

# setting a key works sorts the data.table
setkey(score_dt,sex,x)
print(score_dt)

Here is Another question that deals with the same

I think there must be some function like it to apply on data frames and get data frames as return

Yes there is:

library(plyr)

ddply(score, c('y', 'sex'))

It sounds to me like you're trying to order by score within the males and females and return a combined data frame of sorted males and sorted females.

You are right that by(score, score$sex, function(x) x[order(x$y),]) returns a list of sorted data frames, one for male and one for female. You can use do.call with the rbind function to combine these data frames into a single final data frame:

do.call(rbind, by(score, score$sex, function(x) x[order(x$y),]))
#           x         y sex
# F.5    EMMA  7.526866   F
# F.9  VIOLET  8.182407   F
# F.3   SUSAN  9.677511   F
# M.4   LARRY  6.929395   M
# M.8    MATT  7.970015   M
# M.7     TIM  8.297137   M
# M.6 LEONARD  8.845588   M
# M.2     TOM  9.035948   M
# M.1    MARK 10.082314   M

I believe that the person asked how to sort it by the orders in the case of say 20.

I know how to do that if I have 2 or 3 factors. But what if I had serious levels of factors, say 20, should I write a for loop?

I have one where there are 9 orders with various counts.

stage_name               count
  <ord>                    <int>
1 Closed Lost                957
2 Closed Won                1413
3 Evaluation                1773
4 Meeting Scheduled         4104
5 Nurture                   1222
6 Opportunity Disqualified   805
7 Order Submitted           1673
8 Qualifying                5138
9 Quoted                    4976

In this case you can see that it is displayed using alphabetical order of stage_name, but stage_name is actually an ordered factor that has a very different order.

This code orders the factor is a much different order:

# Make categoricals ----
check_stage$stage_name = ordered(check_stage$stage_name, levels=c(
    'Opportunity Disqualified', 
    'Qualifying',
    'Evaluation',
    'Meeting Scheduled',
    'Quoted',
    'Order Submitted',
    'Closed Won',
    'Closed Lost',
    'Nurture'))

Now we can just apply the factor as the method of ordering this is a dplyr function, but you might need forcats too. I have both libraries installed:

check_stage <- check_stage %>% 
  arrange(factor(stage_name))

This now gives the output in the factor order as desired:

    check_stage

# A tibble: 9 × 2
  stage_name               count
  <ord>                    <int>
1 Opportunity Disqualified   805
2 Qualifying                5138
3 Evaluation                1773
4 Meeting Scheduled         4104
5 Quoted                    4976
6 Order Submitted           1673
7 Closed Won                1413
8 Closed Lost                957
9 Nurture                   1222
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top