Question

What are the advantages of placing data in a new .env in R?-speed, etc.

For data such as time series, is an new .env analogous to a database?

My question spans initally from downloading asset prices in R where it was suggested to place them into a new .env. Why is this so? Thank you:

library(TTR)

url = paste('http://www.nasdaq.com/markets/indices/nasdaq-100.aspx',sep="")
 txt = join(readLines(url)) 

 # extract tables from this pages
 temp = extract.table.from.webpage(txt, 'Symbol', hasHeader = T)
 temp[,2]

 # Symbols
 symbols = c(temp[,2])[2:101]

 currency("USD")
stock(symbols, currency = "USD", multiplier = 1)

# create new environment to store symbols
symEnv <- new.env()

# getSymbols and assign the symbols to the symEnv environment
getSymbols(symbols, from = '2002-09-01', to = '2013-10-17', env = symEnv)
Was it helpful?

Solution

There are advantages to this if your data is large and you have to modify it by passing it through functions. When you send data.frames or vectors to functions that modify them, R will make a copy of the data before making changes to it. You'd then return the modified data from the function and overwrite the old data to complete the modification step.

If your data is large, copying the data for each function call may result in an undesirable amount of overhead. Using environments provides a way around this overhead. environments are handled differently by functions. If you pass an environment to a function and modify the contents, R will operate directly on the environment without making a copy of it. So by putting your data in an environment and passing the environment to the function instead of directly passing the data, you can avoid copying the large dataset.

# here I create a data.frame inside an environment and pass the environment
# to a function that modifies the data.
e <- new.env()
e$k <- data.frame(a=1:3)
f <- function(e) {e$k[1,1] <- 10}
f(e)
# you can see that the original data was changed.
e$k
   a
1 10
2  2
3  3

# alternatively, if I pass just the data.frame, the manipulations do not affect the 
# original data.
k <- data.frame(a=1:3)
f2 <- function(k) {k[1,1] <- 10}
f2(k)
k
  a
1 1
2 2
3 3

OTHER TIPS

Lets compare two cases. With new environment:

e <- new.env()
e$k <- data.frame(a=1:1000000)
f <- function(e) {e$k[1,1] <- 10}
system.time({
    for(i in 1:1000) f(e)
})
head(e$k) 

  user  system elapsed 
  5.32    6.35   11.67 

Without new environment:

k <- data.frame(a=1:1000000)
f <- function(e) {e[1,1] <- 10;return(e);}
system.time({
    for(i in 1:1000) k <- f(k)
}) 
  user  system elapsed 
  5.07    6.82   11.89

not much of a difference...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top