Pregunta

Is it possible to store the order of rows in a data.table while preserving its keys?

Lets say I have the following dummy table:

library(data.table)
dt <- data.table(id=letters[1:6], 
                   group=sample(c("red", "blue"), replace=TRUE), 
                   value.1=rnorm(6), 
                   value.2=runif(6))
setkey(dt, id)
dt
   id group    value.1    value.2
1:  a  blue  1.4557851 0.73249612
2:  b   red -0.6443284 0.49924102
3:  c  blue -1.5531374 0.72977197
4:  d   red -1.5977095 0.08033604
5:  e  blue  1.8050975 0.43553048
6:  f   red -0.4816474 0.23658045

I would like to store this table so that rows are ordered by group, and by value.1 in decreasing order, i.e:

> dt[order(group, value.1, decreasing=T),]
   id group    value.1    value.2
1:  f   red -0.4816474 0.23658045
2:  b   red -0.6443284 0.49924102
3:  d   red -1.5977095 0.08033604
4:  e  blue  1.8050975 0.43553048
5:  a  blue  1.4557851 0.73249612
6:  c  blue -1.5531374 0.72977197

Obviously I can save this as a new variable, but I also want to keep the id column as my primary key.

Arun's answer to "What is the purpose of setting a key in data.table?" suggests that this can be achieved with clever use setkey, since it orders the data.table in the order of its keys (although there is no option to set the key to decreasing order):

> setkey(dt, group, value.1, id)
> dt
   id group    value.1    value.2
1:  c  blue -1.5531374 0.72977197
2:  a  blue  1.4557851 0.73249612
3:  e  blue  1.8050975 0.43553048
4:  d   red -1.5977095 0.08033604
5:  b   red -0.6443284 0.49924102
6:  f   red -0.4816474 0.23658045

However, I lose the ability to use id as my primary key, because group is the first key provided:

> dt["a"]
   group id value.1 value.2
1:     a NA      NA      NA
¿Fue útil?

Solución 3

Building on @eddi's answer, I've created a hackish solution where I store an unevaluated call to order as an attribute of the data.table, which print.data.table obeys:

set_order <- function(dt, cols, decreasing=FALSE) {
  # Store a call to order as an additional attribute
  attr(dt, "order") <- paste0("order(", paste(cols, collapse=", "), 
                              ", decreasing=", decreasing, ")")
  invisible(dt)
}

print.data.table = function(x, ...) {
  if (!is.null(attr(x, "order"))) {
    # Use the stored ordering to print the data.table
    data.table:::print.data.table(x[eval(parse(text=attr(x, "order")))], ...)
  } else {
    data.table:::print.data.table(x, ...)
  }
}

Giving me the behaviour I want:

dt <- set_order(dt, c("group", "value.1"), decreasing=T)
dt
#    id group    value.1    value.2
# 1:  f   red -0.4816474 0.23658045
# 2:  b   red -0.6443284 0.49924102
# 3:  d   red -1.5977095 0.08033604
# 4:  e  blue  1.8050975 0.43553048
# 5:  a  blue  1.4557851 0.73249612
# 6:  c  blue -1.5531374 0.72977197

tables()
#      NAME NROW MB COLS                     KEY
# [1,] dt      6 1  id,group,value.1,value.2 id 
# Total: 1MB

Otros consejos

Sounds like you simply want to modify print.data.table:

print.data.table = function(x, ...) {
  # put whatever condition identifies your tables here
  if ("group" %in% names(x) && "value.1" %in% names(x)) {
    data.table:::print.data.table(x[order(group, value.1, decreasing = T)], ...)
  } else {
    data.table:::print.data.table(x, ...)
  }
}

set.seed(2)
dt = data.table(id=letters[1:6], 
               group=sample(c("red", "blue"), replace=TRUE), 
               value.1=rnorm(6), 
               value.2=runif(6))
setkey(dt, id)
dt
#   id group     value.1    value.2
#1:  a   red  0.18484918 0.40528218
#2:  e   red  0.13242028 0.44480923
#3:  c   red -1.13037567 0.97639849
#4:  b  blue  1.58784533 0.85354845
#5:  f  blue  0.70795473 0.07497942
#6:  d  blue -0.08025176 0.22582546

dt["c"]
#   id group   value.1   value.2
#1:  c   red -1.130376 0.9763985

I think you can still search by id only, as follows:

dt[J(unique(group),unique(value.1),"a"), nomatch=0]
   group   value.1 id   value.2
1:  blue 0.4928595  a 0.3311728

from what I gathered unique(column_name) is the way to include all values for that column.

I am not sure if this helps.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top