Pregunta

If I have a column name that needs backticks because it contains a comma, setkey throws an error saying don't use a comma. The error directs me towards ?setkey but I don't see anything in the examples there that mentions this. Only work around I can find is to rename the column, setkey and then rename back.

Example code:

library(data.table)
> DT = data.table(`X, in $` = rnorm(10))
> DT
        X, in $
 1: -1.28475886
 2:  0.97789059
 3: -0.05023914
 4: -0.38133978
 5: -0.24949607
 6:  0.99213156
 7: -0.29310512
 8:  0.02840372
 9:  0.25294231
10: -0.88955013
> setkey(DT, `X, in $`)
Error in setkeyv(x, cols, verbose = verbose) : 
  Don't use comma inside quotes. Please see the examples in help('setkey')

Edit: showing a more likely example

For me the main reason you would come across this is after using reshape2 dcast to take character column values (which will be from an external source, e.g. database) and make them column names.

So long as you don't need the "join" behaviour of the key, and just wanted to sort, then you can work around this by copying the table, or by using data.frame instead. For example:

library(ggplot2)
library(reshape2)

DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2)
DF = dcast(DT, Product~Office)
DT = data.table(DF)
setkey(DT, 'Cambridge, UK')
DT = DT[order(DF$`Cambridge, UK`),]
DT

produces:

> library(ggplot2)
> library(reshape2)
> 
> DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2)
> DF = dcast(DT, Product~Office)
Using Sales as value column: use value.var to override.
> DT = data.table(DF)
> setkey(DT, 'Cambridge, UK')
Error in setkeyv(x, cols, verbose = verbose) : 
  Don't use comma inside quotes. Please see the examples in help('setkey')
> DT = DT[order(DF$`Cambridge, UK`),]
> DT
    Product Cambridge, UK Cambridge, US      London    New York
 1:      12  0.0009257347  1.7183751269 0.818101229 0.002499808
 2:       1  0.0010855828  0.0889560105 0.083778108 1.451149328
 3:       2  0.0139649148  0.7385617360 0.221688602 4.771307440
 4:       5  0.0520875574  0.3389613574 0.934932759 0.127634044
 5:      10  0.0837778446  0.0598955035 0.015930174 0.715849795
 6:       9  0.0856246191  1.1303900183 1.555058058 0.367063297
 7:       6  0.1608235273  0.7147643550 0.004588596 2.995598768
 8:       8  0.4797866129  0.1783997616 0.016459971 0.497328990
 9:       4  0.5282546636  1.7011670679 0.016126768 0.024388172
10:       7  0.5655147714  0.1106522938 0.045130643 0.442473457
11:       3  0.8315246051  0.1399159784 5.792956446 1.632060601
12:      11  3.9958208033  0.0005297928 0.003282897 1.635506818
¿Fue útil?

Solución

UPDATE (eddi): As of version 1.8.11 this bug has been fixed and arbitrary column names will work with setkey.


I found a hack: (1) sort and (2) settattr.

Example:

mydt <- data.table(`b,ah`=c(2L,3:1),var=letters[1:4])

mydt <- mydt[order(`b,ah`)]
setattr(mydt,'sorted','b,ah')

Now, to verify that it behaves well...

key(mydt)
# [1] "b,ah"
mydt[.(2)]
#    b,ah var
# 1:    2   a
# 2:    2   c
mydt[,.N,by=`b,ah`]
#    b,ah N
# 1:    1 1
# 2:    2 2
# 3:    3 1

Comments. I didn't use the OP's example because setting numeric big floating-point columns as keys is weird (to me).

Who knows what negative side effects this could have? Anyway, I wouldn't use it, and agree it would be nice to have commas supported. Maybe there could be a setkeyn for setting by column number if it makes too much of a mess in setkey/setkeyv?

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top