Frage

I'd like to use R to generate two categorical variables (such as eye color and hair color, for instance) where I can specify the degree to which these two variables are associated. It doesn't really matter to me which levels of eye color would be associated with which levels of hair color, but just being able to specify an overall association, such as by specifying the odds ratio, is a requirement. Also, I know there are ways to do this for two normally distributed continuous variables using, for example, the mvtnorm package, so I could take that route and then choose cut points to make the variables categorical after the fact, but I don't want to do it that way if I can avoid it. Any help would be greatly appreciated!

Edit: apologies for not being clearer from the start, but what I'm really asking I suppose is whether or not there's a function anybody knows of in some R package that will do this in one or two lines.

War es hilfreich?

Lösung

If you can specify the odds ratios (and you also need to specify the baseline odds), you just convert them to probabilities and use runif().

Edit (I misunderstood the question): Take a look at the bindata package.


If you like, here is a function I wrote that you can use to generate such data without the package. It is rather clunky; it's intended to be self-explanatory rather than elegant or fast.

odds.to.probs <- function(odds){
  probs <- odds / (odds+1)
  return(probs)
}

get.correlated.binary.data <- function(N, odds.x.eq.0, odds.y.eq.0.x.eq.0, 
                                       odds.ratio){
  odds.y.eq.0.x.eq.1 <- odds.y.eq.0.x.eq.0*odds.ratio
  prob.x.eq.0        <- odds.to.probs(odds.x.eq.0)
  prob.y.eq.0.x.eq.0 <- odds.to.probs(odds.y.eq.0.x.eq.0)
  prob.y.eq.0.x.eq.1 <- odds.to.probs(odds.y.eq.0.x.eq.1)

  x <- ifelse(runif(N)<=prob.x.eq.0, 0, 1)
  y <- rep(NA, N)
  y <- ifelse(x==0, ifelse(runif(sum(x))<=prob.y.eq.0.x.eq.0,       0, 1), y)
  y <- ifelse(x==1, ifelse(runif( (N-sum(x)) )<=prob.y.eq.0.x.eq.1, 0, 1), y)

  dat <- data.frame(x=x, y=y)
  return(dat)
}

> set.seed(9)
> dat <- get.correlated.binary.data(30, 3, 1.5, -.03)
> table(dat)
   y
x    0  1
  0 10 13
  1  0  7
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top