Question

I've been trying to increase the speed of some code. I've removed all loops, am using vectors and have streamed lined just about everything. I've timed each iteration of my code and it appears to be slowing as iterations increase.

### The beginning iterations
   user  system elapsed 
   0.03    0.00    0.03 
   user  system elapsed 
   0.03    0.00    0.04 
   user  system elapsed 
   0.03    0.00    0.03 
   user  system elapsed 
   0.04    0.00    0.05 

### The ending iterations
   user  system elapsed 
   3.06    0.08    3.14 
   user  system elapsed 
   3.10    0.05    3.15 
   user  system elapsed 
   3.08    0.06    3.15 
   user  system elapsed 
   3.30    0.06    3.37 

I have 598 iterations and right now it takes about 10 minutes. I'd like to speed things up. Here's how my code looks. You'll need the RColorBrewer and fields packages. Here's my data. Yes I know its big, make sure you download the zip file.

    StreamFlux <- function(data,NoR,NTS){
###Read in data to display points###
       WLX = c(8,19,29,20,13,20,21)
       WLY = c(25,28,25,21,17,14,12)
       WLY = 34 - WLY
       WLX = WLX / 44
       WLY = WLY / 33
       timedata = NULL
       mf <- function(i){
       b = (NoR+8) * (i-1) + 8

          ###I read in data one section at a time to avoid headers
          mydata = read.table(data,skip=b,nrows=NoR, header=FALSE)
          rows = 34-mydata[,2]
          cols = 45-mydata[,3]
          flows = mydata[,7]
          rows = as.numeric(rows)
          cols = as.numeric(cols)
          rm(mydata)

          ###Create Flux matrix
          flow_mat <- matrix(0,44,33)

          ###Populate matrix###
          flow_mat[(rows - 1) * 44 + (45-cols)] <- flows+flow_mat[(rows - 1) * 44 + (45-cols)]
          flow_mat[flow_mat == 0] <- NA
          rm(flows)
          rm(rows)
          rm(cols)
          timestep = i

          ###Specifying jpeg info###
          jpeg(paste("Steamflow", timestep, ".jpg",sep = ''),
               width = 640, height=441,quality=75,bg="grey")
          image.plot(flow_mat, zlim=c(-1,1), 
                     col=brewer.pal(11, "RdBu"),yaxt="n",
                     xaxt="n", main=paste("Stress Period ", 
                     timestep, sep = ""))
          points(WLX,WLY)
          dev.off()
          rm(flow_mat)
   }
   ST<- function(x){functiontime=system.time(mf(x))
   print(functiontime)}
   lapply(1:NTS, ST)
}

This is how to run the function

###To run all timesteps###
StreamFlux("stream_out.txt",687,598)
###To run the first 100 timesteps###
StreamFlux("stream_out.txt",687,100)
###The first 200 timesteps###
StreamFlux("stream_out.txt",687,200)

To test remove print(functiontime) to stop it printing at every timestep then

> system.time(StreamFlux("stream_out.txt",687,100))
  user  system elapsed 
  28.22    1.06   32.67 
> system.time(StreamFlux("stream_out.txt",687,200))
   user  system elapsed 
 102.61    2.98  106.20 

What I'm looking for is anyway to speed up running this code and possibly an explanation of why it is slowing down? Should I just run it in parts, seems a stupid solution. I've read about dlply from the plyr. It seems to have worked here but would that help in my case? How about parallel processing, I think I can figure that out but is it worth the trouble in this case?

Was it helpful?

Solution

I will follow @PaulHiemstra's suggestion and post my comment as an answer. Who can resist Internet points? ;)

From a quick glance at your code, I agree with @joran's second point in his comment: your loop/function is probably slowing down due to repeatedly reading in your data. More specifically, this part of the code probably needs to be fixed:

read.table(data, skip=b, nrows=NoR, header=FALSE).

In particular, I think the skip=b argument is the culprit. You should read in all the data at the beginning, if possible, and then retrieve the necessary parts from memory for the calculations.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top