Question

Dropping an element form a list via conventional means (for example ll["name"] <- NULL ), causes the entire list to be copied over. Normally, this is not noticable, until of course the data sets become large.

I have a list with a dozen elements each between 0.25 ~ 2 GB in size. Dropping three elements from this list takes about ten minutes to execute (on a relatively fast machine.)

Is there a way to drop elements from a list in-place?


I have tried the following:

TEST <- list(A=1:20,  B=1:5)

TEST[["B"]] <- NULL
TEST["B"] <- NULL
TEST <- TEST[c(TRUE, FALSE)]
data.table::set(TEST, "B", value=NULL) # ERROR

Output with memory info:

cat("\n\n\nATTEMPT 1\n")
TEST <- list(A=1:20,  B=1:5)
.Internal(inspect(TEST))
TEST[["B"]] <- NULL
.Internal(inspect(TEST))

cat("\n\n\nATTEMPT 2\n")
TEST <- list(A=1:20,  B=1:5)
.Internal(inspect(TEST))
TEST["B"] <- NULL
.Internal(inspect(TEST))

cat("\n\n\nATTEMPT 3\n")
TEST <- list(A=1:20,  B=1:5)
.Internal(inspect(TEST))
TEST <- TEST[c(TRUE, FALSE)]
Was it helpful?

Solution

I don't know how you could make a vector shorter without copying it. The next best thing would be to set the element to missing NA or NULL.

According to ?Extract, you have to specify TEST[i] <- list(NULL) to set an element to NULL. And my tests indicate that i must be an integer or logical vector.

> TEST <- list(A=1:20,  B=1:5); .Internal(inspect(TEST))
@27d2c60 19 VECSXP g0c2 [NAM(1),ATT] (len=2, tl=0)
  @27dd9e0 13 INTSXP g0c6 [] (len=20, tl=0) 1,2,3,4,5,...
  @2805c98 13 INTSXP g0c3 [] (len=5, tl=0) 1,2,3,4,5
ATTRIB:
  @1f38be8 02 LISTSXP g0c0 [] 
    TAG: @d3f478 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
    @2807430 16 STRSXP g0c2 [] (len=2, tl=0)
      @dc2628 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "A"
      @dc25f8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "B"
> TEST[2] <- list(NULL); .Internal(inspect(TEST)); TEST
@27d2c60 19 VECSXP g0c2 [MARK,NAM(1),ATT] (len=2, tl=0)
  @27dd9e0 13 INTSXP g0c6 [MARK] (len=20, tl=0) 1,2,3,4,5,...
  @d3fb78 00 NILSXP g1c0 [MARK,NAM(2)] 
ATTRIB:
  @1f38be8 02 LISTSXP g0c0 [MARK] 
    TAG: @d3f478 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
    @2807430 16 STRSXP g0c2 [MARK] (len=2, tl=0)
      @dc2628 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "A"
      @dc25f8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "B"
$A
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

$B
NULL

OTHER TIPS

As @JoshO'Brien has suggested in his comment it is much more efficient to use environments instead of lists to store large objects in memory. In my experience, Environments confer significant time and memory advantages (for large object storage):

Element lookup time.

Have you noticed that it can be quite slow (a few seconds) to access an object at the end of your list? That's because lists don't know where each element is in memory, they have to find each element by searching through the list (i think).

Accessing a variable in an environment on the other hand is instantaneous (it only has to search through the list of variable names stored in the environment). This is noticeable when your list elements are large!

In place modification.

When modifying (or removing) variables in an environment, only the individual object is copied. When you modify a list, the whole list is copied in the process.

Working with environments

  1. Defining a new environment: TEST <- new.env()
  2. Casting to an environment: TEST <- as.environment(TEST)
  3. Element deletion: rm(A, envir=TEST)
  4. Element creation: TEST$A <- 1:20
  5. Element access: TEST$A
  6. Listing objects stored: ls(pos=TEST) (This is the equivalent of names(TEST))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top