Question

I'm surprised to discover that a matrix I've created from a large raster allegedly takes up a stomping 35k times more memory than the parent. This code demonstrates:

> # comparison with R's built-in volcano data
> object.size(volcano)
42656 bytes
> object.size(as.matrix(volcano))
42656 bytes
> # comparison with my data
> class(region_utm)
[1] "RasterLayer"
attr(,"package")
[1] "raster"
> dim(region_utm)
[1] 7297 7297    1
> object.size(region_utm)
12128 bytes
> region_mat = as.matrix(region_utm)
> dim(region_mat)
[1] 7297 7297
> object.size(region_mat)
425969872 bytes

object.size(region_utm) is certainly giving a wild underestimate, as 12,128 bytes is insufficient to contain 53m values, even factored, particularly as 87% (46m) are unique values (according to length(unique(region_utm))). Not sure then how to get a realistic memory estimate..

However plotting the raster is significantly quicker than working with the matrix. But I've always thought of matrices as roughly equivalent to rasters without spatial data slots, but I must be missing an important difference between these data structures. Grateful for clarification on what could explain this memory disparity. I use matrix workflows a fair bit and need to understand their limitations.


Edit: anticipating a request for an str() report:

> str(region_mat)
 num [1:7297, 1:7297] NA NA NA NA NA NA NA NA NA NA ...
> summary(as.vector(region_mat))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   -1.7    21.3   118.1   135.5   236.9  1020.0 1266438 
> str(region_utm)
Formal class 'RasterLayer' [package "raster"] with 12 slots
  ..@ file    :Formal class '.RasterFile' [package "raster"] with 13 slots
  .. .. ..@ name        : chr "/private/var/folders/kh/vlbqbp3n29lcvp491zbrnpl80000gn/T/R_raster_robinedwards/raster_tmp_2014-02-09_164243_1484_10601.grd"
  .. .. ..@ datanotation: chr "FLT8S"
  .. .. ..@ byteorder   : Named chr "little"
  .. .. .. ..- attr(*, "names")= chr "value"
  .. .. ..@ nodatavalue : num -1.7e+308
  .. .. ..@ NAchanged   : logi FALSE
  .. .. ..@ nbands      : int 1
  .. .. ..@ bandorder   : Named chr "BIL"
  .. .. .. ..- attr(*, "names")= chr "value"
  .. .. ..@ offset      : int 0
  .. .. ..@ toptobottom : logi TRUE
  .. .. ..@ blockrows   : int 0
  .. .. ..@ blockcols   : int 0
  .. .. ..@ driver      : chr "raster"
  .. .. ..@ open        : logi FALSE
  ..@ data    :Formal class '.SingleLayerData' [package "raster"] with 13 slots
  .. .. ..@ values    : logi(0) 
  .. .. ..@ offset    : num 0
  .. .. ..@ gain      : num 1
  .. .. ..@ inmemory  : logi FALSE
  .. .. ..@ fromdisk  : logi TRUE
  .. .. ..@ isfactor  : logi FALSE
  .. .. ..@ attributes: list()
  .. .. ..@ haveminmax: logi TRUE
  .. .. ..@ min       : num -1.73
  .. .. ..@ max       : num 1020
  .. .. ..@ band      : int 1
  .. .. ..@ unit      : chr ""
  .. .. ..@ names     : chr "layer"
  ..@ legend  :Formal class '.RasterLegend' [package "raster"] with 5 slots
  .. .. ..@ type      : chr(0) 
  .. .. ..@ values    : logi(0) 
  .. .. ..@ color     : logi(0) 
  .. .. ..@ names     : logi(0) 
  .. .. ..@ colortable: logi(0) 
  ..@ title   : chr(0) 
  ..@ extent  :Formal class 'Extent' [package "raster"] with 4 slots
  .. .. ..@ xmin: num 180386
  .. .. ..@ xmax: num 394918
  .. .. ..@ ymin: num 1879673
  .. .. ..@ ymax: num 2103691
  ..@ rotated : logi FALSE
  ..@ rotation:Formal class '.Rotation' [package "raster"] with 2 slots
  .. .. ..@ geotrans: num(0) 
  .. .. ..@ transfun:function ()  
  ..@ ncols   : int 7297
  ..@ nrows   : int 7297
  ..@ crs     :Formal class 'CRS' [package "sp"] with 1 slots
  .. .. ..@ projargs: chr "+proj=utm +zone=16 ellps=WGS84 +ellps=WGS84"
  ..@ history : list()
  ..@ z       : list()
Was it helpful?

Solution

In fact, I'm almost certain that object.size doesn't do what you want on S4 objects. Take a peek at the code for cgwtools::lssize . Here's part of what I wrote, with a lot of help from other SO contributors:

fb4 <- function(x) {
               if (isS4(x)) {
                      slots <- setNames(slotNames(x), slotNames(x))
                      lapply(lapply(slots, slot, object=x), fb4)
                      } else object.size(if(is.list(x)) unlist(x) else x)
                    }
            fb4(object)
            }
    sizes<-sapply(items,function(k) sum(unlist(s4gonebyte(get(k))) ),simplify=FALSE)

That's a recursive dive into S4 slots and subslots.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top