It's not only returning odd rows!
tail(subset(iris, Species == c("setosa", "virginica"), select = -Species))
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 140 6.9 3.1 5.4 2.1
## 142 6.9 3.1 5.1 2.3
## 144 6.8 3.2 5.9 2.3
## 146 6.7 3.0 5.2 2.3
## 148 6.5 3.0 5.2 2.0
## 150 5.9 3.0 5.1 1.8
This is due to R's recycling. Look at the output of iris$Species == c("setosa", "virginica")
. It is switching off between testing Species == "setosa"
and Species == "virginica"
. Since there happens to be an even number of rows in the data R c("setosa", "virginica")
recycles with no remainder and R assumes that you wanted to recycle.
If we add another row we get a warning message
iris <- rbind(iris, tail(iris, 1))
foo <- subset(iris, Species == c("setosa", "virginica"))
## Warning messages:
## 1: In is.na(e1) | is.na(e2) :
## longer object length is not a multiple of shorter object length
## 2: In `==.default`(Species, c("setosa", "virginica")) :
## longer object length is not a multiple of shorter object length
You want to use %in%
datanew <- subset(iris, Species %in% c("setosa", "virginica"), select = -Species)
head(datanew)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 5.1 3.5 1.4 0.2
## 2 4.9 3.0 1.4 0.2
## 3 4.7 3.2 1.3 0.2
## 4 4.6 3.1 1.5 0.2
## 5 5.0 3.6 1.4 0.2
## 6 5.4 3.9 1.7 0.4
tail(datanew)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 145 6.7 3.3 5.7 2.5
## 146 6.7 3.0 5.2 2.3
## 147 6.3 2.5 5.0 1.9
## 148 6.5 3.0 5.2 2.0
## 149 6.2 3.4 5.4 2.3
## 150 5.9 3.0 5.1 1.8