Question

In R, how can I generate N numbers that have a mean of X and a median of Y (at least close to).

Or perhaps more generally, is there an algorithm for this?

Was it helpful?

Solution

There is an infinite number of solutions.

Approximate algorithm:

  1. Generate n/2 numbers below the median
  2. Generate n/2 numbers above the median
  3. Add you desired median and check
  4. Add one number with enough weight to satisfy your mean -- which you can solve

Example assuming you want a median of zero and a mean of twenty:

R> set.seed(42)
R> lo <- rnorm(10, -10); hi <- rnorm(10, 10)
R> median(c(lo,0,hi))
[1] 0                         # this meets our first criterion
R> 22*20 - sum(c(lo,0,hi))    # (n+1)*desiredMean - currentSum
[1] 436.162                   # so if we insert this, we the right answer
R> mean(c(lo,0,hi,22*20 - sum(c(lo,0,hi))))
[1] 20                        # so we meet criterion two
R> 

because desiredMean times (n+1) has to be equal to sum(currentSet) + x so we solve for x getting the expression above.

OTHER TIPS

For a set of data that looks fairly 'normal', you can use the correction factor method as outlined by @Dirk-Eddelbuettel but with your custom values used to generate a set of data around your mean:

X = 25
Y = 25.5
N = 100
set.sd = 5 # if you want to set the standard deviation of the set.

set <- rnorm(N, Y, set.sd) # generate a set around the mean
set.left <- set[set < X] # take only the left half
set <- c(set.left, X + (X - set.left)) # ... and make a copy on the right.

# redefine the set, adding in the correction number and an extra number on the opposite side to the correction: 
set <- c(set, 
     X + ((set.sd / 2) * sign(X - Y)),
     ((length(set)+ 2) * Y) 
     - sum(set, X + ((set.sd / 2) * sign(X - Y)))
     ) 

Take strong heed of the first answer's first sentence. Unless you know what underlying distribution you want, you can't do it. Once you know that distribution, there are R-functions for many standards such as runif, rnorm, rchisq . You can create an arb. dist with the sample function.

If you are okay with the restriction X < Y, then you can fit a lognormal distribution. The lognormal conveniently has closed forms for both mean and median.

rmm <- function(n, X, Y) rlnorm(n, log(Y), sqrt(2*log(X/Y)))

E.g.:

z <- rmm(10000, 3, 1)
mean(z)
# [1] 2.866567
median(z)
# [1] 0.9963516
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top