Pergunta

I have a data frame (named 'mdf') which includes two columns. The basic information is below:

> head(mdf); tail(mdf)
  Country Rank
1     ABW  161
2     AFG  105
3     AGO   60
4     ALB  125
5     ARE   32
6     ARG   26
    Country Rank
184     WSM  181
185     YEM   90
186     ZAF   28
187     ZAR  112
188     ZMB  104
189     ZWE  134
> str(mdf)
'data.frame':   189 obs. of  2 variables:
 $ Country: Factor w/ 229 levels "","ABW","ADO",..: 2 4 5 6 7 8 9 11 12 13 ...
 $ Rank   : Factor w/ 195 levels "",".. Not available.  ",..: 72 10 149 32 118 111 41 84 26 112 ...

My purpose is to rearrange it by ordering 'Rank' variable, but the result is:

> mdf[order(mdf$Rank),]
    Country Rank
178     USA    1
78      IND   10
153     SLV  100
170     TTO  101
43      CYP  102
54      EST  103
188     ZMB  104
2       AFG  105
175     UGA  106
130     NPL  107
73      HND  108
60      GAB  109
31      CAN   11
67      GNQ  110

As you see, it is not what I need (i.e. increasing order). How can I do it? Thanks!

Foi útil?

Solução

To get the answer you are looking for, use:

mdf[order(as.numeric(as.character(mdf$Rank))),]

The reason your original code doesn't work is that your Rank variable is a factor, so it will be sorted by the levels of the factor. For example, if you had a data frame such that:

DF
#    x
# 1  2
# 2 22
# 3 11
# 4  1

and order the data

DF[order(DF$x),]

and you look at the levels:

levels(DF$x)
#   [1] "1"  "2"  "11" "22"

We can reorder the levels such that

levels(DF$x) <- relevel(DF$x, ref = '11')

Now,

levels(DF$x)
# [1] "2"  "22" "11" "1" 

So ordering with the new factor levels we get different results:

DF[order(DF$x),]

To answer your question of why as.numeric doesn't work, it's because each factor level has an associated integer, which you get with as.numeric. If you want the number that is the factor label, you must first convert to a character and then convert to numeric, thus requiring as.numeric(as.character(x))

For example, calling as.numeric(DF$x) gives the integer values for each level, but not the actual label for each level:

# [1] 2 4 3 1

One way to avoid this in the future if you are loading your data frame from a .csv file is to use read.csv(..., stringsAsFactors=FALSE), or also I like the fread function in data.table which uses safer default types.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top