Why does gdata:::reorder.factor behave differently from stats:::reorder.default for integers and doubles?

StackOverflow https://stackoverflow.com/questions/20317343

  •  07-08-2022
  •  | 
  •  

Question

This is a follow up to Reordering factor gives different results, depending on which packages are loaded, with another, related question.

@Andrie's answer is the correct one and following @David Lovell's comment, I am a third confused soul because of this.

In my case it was because I had loaded ROCR, which depends on gplots, which depends on gdata, and I hadn't even heard of gdata, to illustrate my ignorance, and therefore didn't think to search for it.

I've discovered another quirk, which made it even more difficult to work out in my case, and is the point of this question. Something about gdata:::reorder.factor handles integers and numerics differently. To illustrate:

library(gdata)
x <- factor(letters[1:6])
y <- c(1,4,3,5,6,2)
z <- c(1.1,2.4,1.3,2.5,2.6,1.2)
stats:::reorder.default(x, y, function(X)-X) #edbcfa - correct
stats:::reorder.default(x, z, function(X)-X) #edbcfa
stats:::reorder.default(x, -y)               #edbcfa
stats:::reorder.default(x, -z)               #edbcfa
gdata:::reorder.factor(x, y, function(X)-X)  #edbcfa
gdata:::reorder.factor(x, z, function(X)-X)  #bdeafc - weird
gdata:::reorder.factor(x, -y)                #abcdef - no reordering
gdata:::reorder.factor(x, -z)                #abcdef - no reordering

It's mostly the bdeafc that I'm interested in. It has got the bit before the decimal correct, in that the 2.x are before the 1.x, but the part after the decimal point is in normal order, not reverse order: x.1 before x.2 before x.3.

Why is this?

Was it helpful?

Solution

Why this is happening:

Hm, this seems to be because gdata:::reorder.factor takes in an argument named sort which by default has value mixedsort. This mixedsort argument uses mixedorder function from package gtools. By loading gtools and doing ?mixedorder, you can find out why this is happening:

?mixedorder

Order or Sort strings with embedded numbers so that the numbers are in the correct order:

These functions sort or order character strings containing numbers so that the numbers are numerically sorted rather than sorted by character value. I.e. "Asprin 50mg" will come before "Asprin 100mg". In addition, case of character strings is ignored so that "a", will come before "B" and "C".

Also ?reorder.factor clearly states this:

?gdata:::reorder.factor

If sort is provided (as it is by default): The new factor level names are generated by applying the supplied function to the existing factor level names. With sort=mixedsort the factor levels are sorted so that combined numeric and character strings are sorted in according to character rules on the character sections (including ignoring case), and the numeric rules for the numeric sections. See mixedsort for details.


Solution:

You'll have to provide a value of NULL to sort argument so that mixedsort is not taken by default.

gdata:::reorder.factor(x, z, function(X)-X, sort=NULL)
# [1] a b c d e f
# Levels: e d b c f a

Alternatively, as @BenBolker points out under comments, you can provide "sort" argument as simply sort:

gdata:::reorder.factor(x, z, function(X)-X, sort=sort)

On debugging:

For the future, debugonce is your friend for these sort of things. By doing

debugonce(gdata:::reorder.factor)
gdata:::reorder.factor(x, z, function(X)-X)

(and hitting enter and inspecting the output) you can find that the issue comes from the last few lines that are being run:

else if (!missing(FUN)) 
    new.order <- names(sort(tapply(X, x, FUN, ...)))

For your data,

> X
# [1] 1.1 2.4 1.3 2.5 2.6 1.2

> x
# [1] a b c d e f
# Levels: a b c d e f

And, tapply(...) gives:

> tapply(X, x, FUN, ...)
#    a    b    c    d    e    f 
# -1.1 -2.4 -1.3 -2.5 -2.6 -1.2 

Here, the "sort" should give:

> base:::sort(tapply(X, x, FUN, ...))
#    e    d    b    c    f    a 
# -2.6 -2.5 -2.4 -1.3 -1.2 -1.1 

But it gives:

#   b    d    e    a    f    c 
# -2.4 -2.5 -2.6 -1.1 -1.2 -1.3 

This is because the "sort" that's being called is not from base, which can be seen by typing "sort" from within the debugger:

> sort # from within the function call (using debugonce)
# function (x) 
# x[mixedorder(x)]
# <environment: namespace:gtools>

mixedorder is a function from package gtools. Since the command fetches the names and the sorting is wrong, the wrong order is being fetched. So basically the issue is that the sort that's being called is mixedsort and not base:::sort.

It's easy to verify this by installing gtools and doing:

require(gtools)
gtools:::mixedorder(c(-2.4, -2.5, -2.6))
# [1] 1 2 3

order(c(-2.4, -2.5, -2.6))
# [1] 3 2 1

Therefore, you'll have to provide sort=NULL to make sure this doesn't happen.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top