R quirk: Normalize the content of a vector by binned values of another vector

Question

scale is your friend here in terms of normalising to mean=0, sd=1, and if sd=1, var=1.

> mean(scale(1:10))
[1] 0
> sd(scale(1:10))
[1] 1
> var(scale(1:10))
     [,1]
[1,]    1

Try some example data:

set.seed(42)
dat <- data.frame(freq=sample(1:100), scores=rnorm(100, mean=4, sd=2))
dat$bins <- cut(dat$freq, breaks=c(0, 1:10*10), include.lowest=TRUE)

Now use ave to scale the scores within each of the bins:

dat$scaled <- with(dat,ave(scores,bins,FUN=scale))

You can check the results with aggregate or similar:

The mean is 0 (or very close to within rounding error) in each bin.

> aggregate(scaled ~ bins, data=dat, FUN=function(x) round(mean(x), 2) )
       bins scaled
1    [0,10]      0
2   (10,20]      0
3   (20,30]      0
4   (30,40]      0
5   (40,50]      0
6   (50,60]      0
7   (60,70]      0
8   (70,80]      0
9   (80,90]      0
10 (90,100]      0

The sd is 1 in each bin:

> aggregate(scaled ~ bins, data=dat, FUN=sd)
       bins scaled
1    [0,10]      1
2   (10,20]      1
3   (20,30]      1
4   (30,40]      1
5   (40,50]      1
6   (50,60]      1
7   (60,70]      1
8   (70,80]      1
9   (80,90]      1
10 (90,100]      1