Why subset does not work with a vector name identical to a column name?
I came across a confusing "feature" of subset function (using column name as a vector name for subsetting does not work):
data(iris) Species <- unique(iris$Species) i <- 2 Species[i] subset(iris, subset = Species == Species[i]) sp <- unique(iris$Species) sp[i] subset(iris, subset = Species == sp[i])
Could someone explain me, what happens here and why?
subset() will first look inside the dataframe for any object you mention, so in your first example
Species[i] returns 'setosa' (the same as
iris$Species[i]). Only when the object you specify cannot be found inside the data frame, R looks in the parent frames and will find the correct object there.
So it all does work, you just don't understand how it works. You could have read this in the help files :
Note that subset will be evaluated in the data frame, so columns can be referred to (by name) as variables in the expression (see the examples).
How does this come about?
The reason is the following lines of code in
e <- substitute(subset) r <- eval(e, x, parent.frame())
e) is in your example
Species == Species[i]
xis in your example
parent.frame()returns in your example the global environment.
The second argument of the call to
x is called
envir. It is the environment (or list or data frame, ...) where the expression is evaluated. In your case, R evaluates
Species == Species[i] inside
x, which is your data frame.
The third argument,
parent.frame(), is the enclosure. This is the environments that encloses the data frame you specified als environment, and is the place where R will look in case the variables aren't found in the dataframe.
See also ?