R Error : some group is too small for 'qda'

Question 1

tl;dr my guess is that your predictor variables got made into factors or character vectors by accident. This can easily happen if you have some minor glitch in your data set, such as a spurious character in one row.

Here's a way to make up a data set that looks like yours:

set.seed(101)
mytest <- data.frame(type=rep(c("monocot","dicot"),each=100),
                 mono_score=runif(100,0,100),
                 dicot_score=runif(100,0,100))

Some useful diagnostics:

str(mytest)
## 'data.frame':    200 obs. of  3 variables:
## $ type       : Factor w/ 2 levels "dicot","monocot": 2 2 22 2 2 2 ...
##  $ mono_score : num  37.22 4.38 70.97 65.77 24.99 ...
##  $ dicot_score: num  12.5 2.33 39.19 85.96 71.83 ...
summary(mytest)
##       type       mono_score      dicot_score     
##  dicot  :100   Min.   : 1.019   Min.   : 0.8594  
##  monocot:100   1st Qu.:24.741   1st Qu.:26.7358  
##                Median :57.578   Median :50.6275  
##                Mean   :52.502   Mean   :52.2376  
##                3rd Qu.:77.783   3rd Qu.:78.2199  
##                Max.   :99.341   Max.   :99.9288  
## 
with(mytest,table(type))
## type
##   dicot monocot 
##    100     100

Importantly, the first two (str() and summary()) show us what type each variable is. Update: it turns out the third test is actually the important one in this case, since the problem was a spurious extra level: the droplevel() function should take care of this problem ...

This made-up example seems to work fine, so there must be something you're not showing us about your data set ...

library(MASS)
qda(type~mono_score+dicot_score,data=mytest)

Here's a guess. If your score variables were actually factors rather than numeric, then qda would automatically attempt to create dummy variables from them which would then make the model matrix much wider (101 columns in this example) and provoke the error you're seeing ...

bad <- transform(mytest,mono_score=factor(mono_score))
qda(type~mono_score+dicot_score,data=bad)
## Error in qda.default(x, grouping, ...) : 
##    some group is too small for 'qda'

Question 2

I had this error as well, so I explained what went wrong on my side for anyone stumbling upon this in the future.

You might have factors on the variable you want to predict. All levels in this factor must have some amount of observations. If you don't have enough observations in a group, you will get this error.

For me, I removed a level completely, but there was still this level left in the factor.

To remove this you have to do this

df$var %<>% factor

NB. %<>% requires magrittr

However, even when I did this, it still failed. When I debugged this further it appears that if you subset from a dataframe that had factor applied you have to refactor again, somehow.

Question 3

Your grouping variable has 3 levels including 'other' with non cases. Since the number of response variables (2 variables, i.e. mono_score, dicot_score) is larger than the number of cases in any given group level (100, 100 and 0, for dicot, monocot and other, respectively), the analysis cannot be performed. One way to get rid of unnecesary group levels is by redifining the grouping variable as factor after setting it to character:

test.data$type <- as.factor(as.character(test.data$type))

Another alternative is by defining the levels of the grouping variable:

test.data$type <- factor(test.data$type, levels = c("dicot", "monocot"))

If your dataset was so unbalanced and had, for example, 2 cases of 'other', it would probably make sense to exclude them from the analysis.

This message could still appear if the number of response variables is larger than the number of cases in any given group level. Since you have 100 cases for both group levels (i.e. dicot, monocot) and only two response variables (i.e. mono_score, dicot_score) this should not be a problem anymore.