Domanda

When generating random numbers in R using rnorm (or runif etc.), they seldom have the exact mean and SD as the distribution they are sampled from. Is there any simple one-or-two-liner that does this for me? As a preliminary solution, I've created this function but it seems like something that should be native to R or some package.

# Draw sample from normal distribution with guaranteed fixed mean and sd
rnorm_fixed = function(n, mu=0, sigma=1) {
  x = rnorm(n)  # from standard normal distribution
  x = sigma * x / sd(x)  # scale to desired SD
  x = x - mean(x) + mu  # center around desired mean
  return(x)
}

To illustrate:

x = rnorm(n=20, mean=5, sd=10)
mean(x)  # is e.g. 6.813...
sd(x)  # is e.g. 10.222...

x = rnorm_fixed(n=20, mean=5, sd=10)
mean(x)  # is 5
sd(x)  # is 10

The reason I want this is that I adjust my analysis on simulated data before applying it to real data. This is nice because with simulated data I know the exact properties (means, SDs etc.) and I avoid p-value inflation because I'm doing inferential statistics. I am asking if there exist anything simple like e.g.

rnorm(n=20, mean=5, sd=10, fixed=TRUE)
È stato utile?

Soluzione

Since you asked for a one-liner:

rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
r <- rnorm2(100,4,1)
mean(r)  ## 4
sd(r)    ## 1

Altri suggerimenti

This is an improvement of the function suggested in a previous answer so that it complies with the OP's need of having a "fixed" argument.

And still in one line ;-)

rnorm. <- function(n=10, mean=0, sd=1, fixed=TRUE) { switch(fixed+1, rnorm(n, mean, sd), as.numeric(mean+sd*scale(rnorm(n)))) }
rnorm.() %>% {c(mean(.), sd(.))}
#### [1] 0 1
rnorm.(,,,F) %>% {c(mean(.), sd(.))}
#### [1] 0.1871827 0.8124567

I chose to enter default values for every argument and add a as.numeric step to get rid of the attributes generated by the scale function.

The mvrnorm() function in the MASS package can do this.

library(MASS)
#empirical=T forces mean and sd to be exact
x <- mvrnorm(n=20, mu=5, Sigma=10^2, empirical=T)
mean(x)
sd(x)
#empirical=F does not impose this constraint
x <- mvrnorm(n=20, mu=5, Sigma=10^2, empirical=F
mean(x)
sd(x)
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top