Question

I have the following data set, with numerosity referred to (different sized) intervals:

Income              Numerosity
from 6000 to 7500       704790
from 7500 to 10000     1294784
from 10000 to 12000    1051902
from 12000 to 15000    1585132
from 15000 to 20000     704012
from 20000 to 25000     206901
from 25000 to 30000     156661

I'd like to obtain an (approximated) data set as follows:

Income  Numerosity
6000           ...
7000           ... 
8000           ...
...            ...
30000          ...

To this aim, I tried the following: first I used sample(6000:7500, 704790, replace=TRUE) for each row and concatenated results to create a vector rpop of generated observation. Then, I applied the function density (I tried different values of the parameter bw to smooth the distribution)

d=density(rpop,bw=2000,from=6000,to=30000,n=25)

d$x gives the required income levels, while numerosities are proportional to d$y

However, I wonder if there are better (more direct or elegant) ways to obtain the same result.

Was it helpful?

Solution

The approx function is meant for this kind of interpolation.

Example:

> d <- read.table(header=T, text="Income     Numerosity
+ 6000       704790
+ 7500       1294784
+ 10000      1051902
+ 12000      1585132
+ 15000      704012
+ 20000      206901
+ 25000      156661")

> res <- approx(d$Income, d$Numerosity, seq(from=6000, to=30000, length.out=25))
> res
$x
 [1]  6000  7000  8000  9000 10000 11000 12000 13000 14000 15000 16000 17000
[13] 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000 28000 29000
[25] 30000

$y
 [1]  704790.0 1098119.3 1246207.6 1149054.8 1051902.0 1318517.0 1585132.0
 [8] 1291425.3  997718.7  704012.0  604589.8  505167.6  405745.4  306323.2
[15]  206901.0  196853.0  186805.0  176757.0  166709.0  156661.0        NA
[22]        NA        NA        NA        NA
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top