Question

How do I choose the right keys for data.table objects?

Are the considerations similar to those for RDBMSs? My first guess was to have a look for some documentation about indexes and keys for RDBMSs. Google came up with this helpful stackoverflow question related to Oracle.

Do the considerations from that answer apply to data.tables? Perhaps with the exception of those relating to UPDATE, INSERT or DELETE type statements? I'm guessing that our data.tables objects won't really be used in that way.

I'm trying to get my head around this stuff by using the documentation and examples, but I haven't seen any discussion on key selection.

PS: Thanks to @crayola pointing me toward the data.table package in the first place!

Was it helpful?

Solution

I am not sure this is a very helpful answer, but since you mention me in the question I'll say what I think anyway. But remember that I am a bit of a data.table newbie myself.

I personally only use keys when there is a clear benefit for it, e.g. merging datatables, or where it seems clear that doing so will speed things up (e.g. subsetting repeatedly on a variable). But to my knowledge, there is sometimes no real need to define keys at all; the package is already faster than data.frame without keys.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top