Question

I am trying to load a dataset into R using the data() function. It works fine when I use the dataset name (e.g. data(Titanic) or data("Titanic")). What doesn't work for me is loading a dataset using a variable instead of its name. For example:

# This works fine:
> data(Titanic)

# This works fine as well:
> data("Titanic")

# This doesn't work:
> myvar <- Titanic
> data(myvar)
**Warning message:
In data(myvar) : data set ‘myvar’ not found**

Why is R looking for a dataset named "myvar" since it is not quoted? And since this is the default behavior, isn't there a way to load a dataset stored in a variable?

For the record, what I am trying to do is to create a function that uses the "arules" package and mines association rules using Apriori. Thus, I need to pass the dataset as a parameter to that function.

myfun <- function(mydataset) {
    data(mydataset)    # doesn't work (data set 'mydataset' not found)
    rules <- apriori(mydataset)
}

edit - output of sessionInfo():

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] arules_1.0-14   Matrix_1.0-12   lattice_0.20-15 RPostgreSQL_0.4 DBI_0.2-7      

loaded via a namespace (and not attached):
[1] grid_3.0.0  tools_3.0.0

And the actual errors I am getting (using, for example, a sample dataset "xyz"):

xyz <- data.frame(c(1,2,3))
data(list=xyz)
Warning messages:
1: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
3: In if (name %in% names(rds)) { :
  the condition has length > 1 and only the first element will be used
4: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
5: In if (name %in% names(rds)) { :
  the condition has length > 1 and only the first element will be used
6: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used

...

...

32: In data(list = xyz) :
  c("data set ‘1’ not found", "data set ‘2’ not found", "data set ‘3’ not found")
Was it helpful?

Solution 4

I am answering my own question, but I have found the solution at last. Quoting R help:

"Data sets are searched for in all the currently loaded packages, then in the ‘data’ directory (if any) of the current working directory."

Thus, all one has to do is write the dataset in a file and place it into a directory named "data" and located into the working directory.

> write.table(mydataset,file="dataset.csv",sep=",",quote=TRUE,row.names=FALSE)  # I intend to create a csv file, so I use 'sep=","' to separate the entries by a comma, 'quote=TRUE' to quote all the entries, and 'row.names=F to prevent the creation of an extra column containing the row names (which is the default behavior of write.table() )

# Now place the dataset into a "data" directory (either via R or via the operating system, doesn't make any difference):
> dir.create("data")  # create the directory
> file.rename(from="dataset.csv",to="data/dataset.csv")  # move the file

# Now we can finally load the dataset:
> data("mydataset")  # data(mydataset) works as well, but quoted is preferable - less risk of conflict with another object coincidentally named "mydataset" as well

OTHER TIPS

Use the list argument. See ?data.

data(list=myvar)

You'll also need myvar to be a character string.

myvar <- "Titanic"

Note that myvar <- Titanic only worked (I think) because of the lazy loading of the Titanic data set. Most datasets in packages are loaded this way, but for other kinds of data sets, you'd still need the data command.

Use the variable as character. Otherwise you will be processing the contents of "Titanic" rather than its name. You may also need to use get in order to convert the character value to an object name.

myvar <- 'Titanic'

myfun <- function(mydataset) {
    data(list=mydataset)   
    str(get(mydataset))
}

myfun(myvar)

If the package has been loaded, you can use the get() function to assign the data set to a local variable:

data_object = get(myvar, asNamespace('<package_name>'))

or simply:

data_object = get(myvar)

Assign_Name <- read.csv(file.choose())

This line of code opens your local machine, just select the data-set you want to load it R environment

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top