Question

Bit of a R novice here, so it might be a very simple problem.

I've got a dataset with GENDER (being a binary variable) and a whole lot of numerical variables. I wanted to write a simple function that checks for equality of variance and then performs the appropriate t-test.

So my first attempt was this:

genderttest<-function(x){                                             # x = outcome variable

  attach(Dataset)
  on.exit(detach(Dataset))  

  VARIANCE<-var.test(Dataset[GENDER=="Male",x], Dataset[GENDER=="Female",x])

  if(VARIANCE$p.value<0.05){
    t.test(x~GENDER)
  }else{
    t.test(x~GENDER, var.equal=TRUE)
  }
}

This works well outside of a function (replacing the x, of course), but gave me an error here because variable lengths differ.

So I thought it might be handling the NA cases strangely and I should clean up the dataset first and then perform the tests:

genderttest<-function(x){                                               # x = outcome variable
  Dataset2v<-subset(Dataset,select=c("GENDER",x))
  Dataset_complete<-na.omit(Dataset2v)

  attach(Dataset_complete)
  on.exit(detach(Dataset_complete))  

  VARIANCE<-var.test(Dataset_complete[GENDER=="Male",x], Dataset_complete[GENDER=="Female",x])

  if(VARIANCE$p.value<0.05){
    t.test(x~GENDER)
  }else{
    t.test(x~GENDER, var.equal=TRUE)
  }
}

But this gives me the same error.

I'd appreciate if anyone could point out my (probably stupid) mistake.

Was it helpful?

Solution

I believe the problem is that when you call t.test(x~GENDER), it's evaluating the variable x within the scope of Dataset rather than the scope of your function. So it's trying to compare values of x between the two genders, and is confused because Dataset doesn't have a variable called x in it.

A solution that should work is to call:

do.call('t.test', args=list(formula=as.formula(paste0(x,'~GENDER')), data=Dataset))

do.call('t.test', args=list(formula=as.formula(paste0(x,'~GENDER')), var.equal=T, data=Dataset))

which will call t.test() and pass the value of x as part of the formula argument rather than the character x (i.e score ~ GENDER instead of x ~ GENDER).

The reason for the particular error you saw is that Dataset$GENDER has length equal to the number of rows in Dataset, while Dataset$x has length = 0.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top