Question

I have a data.table with keys x and y and I want to sort by a column z within each key.

> DT
    x y          z
 1: a a  0.5526312
 2: a a  0.6339102
 3: a a -0.7490821
 4: a a -0.6850176
 5: a a  1.7943156
 6: a b  0.9271090
 7: a b  1.3936642
 8: a b  1.4815404
 9: a b -0.7850981
10: a b -1.0487700
11: b c  1.5184297
12: b c -0.4640705
13: b c -0.6513462
14: b c -0.5568319
15: b c  1.5422990
16: b d  0.8810654
17: b d -0.1895812
18: b d -2.6263378
19: b d  0.7371594
20: b d  1.4122076

My first attempt is DT[order(z), .SD, by = list(x, y)], however, this does not keep the keyed columns sorted. I know I can do this in two steps:

DT <- DT[order(x, y, z)]
setkeyv(DT, c('x', 'y'))

However, this does not seem like a good practice because you are hoping the column z will remain sorted when the keys are being sorted. I also do not want to set z as key because it is not meant to be used as keys later on. Is there a more elegant way to achieve this?

Was it helpful?

Solution

Why not:

setkey(DT, x, y, z)
setkey(DT, x, y)

Use setkey to sort by all three columns, and then remove the last column from the key by resetting it. Also, to address your concern about the sort being maintained, it is documented (2nd paragraph of Details section of data.table documentation):

The sort is stable; i.e., the order of ties (if any) is preserved.

This means that when you sort by x and y, after you have sorted by x, y, and z, the orders of z within any set of x-y values will be undisturbed because they all tie wrt to the x-y values

OTHER TIPS

You could set the key including z and then unset it. It remains sorted by z within x and y but it is no longer a key.

setkey(DT, x, y, z)
setkey(DT, x, y)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top