Question

I know how to find quantile of an empirical distribution.

set.seed(1)
x = rnorm(100)
q = quantile(x, prob=seq(0,1,.01))

Is there a function that would give me the quantile bin a number of the training set belongs to ? In this example

R) x[1]
[1] -0.6264538107
R) q
             0%              1%              2%              3%              4%              5%              6%              7%              8% 
-2.214699887177 -1.991605177777 -1.808646490230 -1.532008555284 -1.472864960560 -1.381744198182 -1.282620249360 -1.255240516814 -1.226934277726 
             9%             10%             11%             12%             13%             14%             15%             16%             17% 
-1.137935552774 -1.052657473293 -0.946201701058 -0.847444894718 -0.822439213796 -0.754080533415 -0.714945447616 -0.707887360796 -0.691941403160 
            18%             19%             20%             21%             22%             23%             24%             25%             26% 
-0.637668149828 -0.622231094280 -0.613869230709 -0.594247090071 -0.576841631266 -0.569725969545 -0.548795719430 -0.494242549079 -0.474635485293 
            27%             28%             29%             30%             31%             32%             33%             34%             35% 
-0.451421239288 -0.422917810077 -0.400294290491 -0.375342019640 -0.324556644843 -0.304569351961 -0.270133020491 -0.194728544774 -0.158850338047 
            36%             37%             38%             39%             40%             41%             42%             43%             44% 
-0.142600696093 -0.135100488041 -0.120975401008 -0.106515536418 -0.076703128964 -0.057434448974 -0.054780994140 -0.048748324589 -0.041745189497 
            45%             46%             47%             48%             49%             50%             51%             52%             53% 
-0.026562645934 -0.006850631144  0.015360659421  0.052098524774  0.074455390351  0.113909160789  0.168144431357  0.186114832362  0.225596350406 
            54%             55%             56%             57%             58%             59%             60%             61%             62% 
 0.278298615355  0.308573926852  0.331022515551  0.336463178904  0.350973845124  0.366811069726  0.377079930574  0.388518545252  0.392983041115 
            63%             64%             65%             66%             67%             68%             69%             70%             71% 
 0.405445081905  0.438666028932  0.479681362135  0.510968662152  0.557264863548  0.562081050166  0.571598761948  0.581217342523  0.593914332477 
            72%             73%             74%             75%             76%             77%             78%             79%             80% 
 0.598644634069  0.613183189979  0.638003287679  0.691545365689  0.697743441191  0.708979192306  0.743791934661  0.764300755430  0.771253599759 
            81%             82%             83%             84%             85%             86%             87%             88%             89% 
 0.789562430661  0.832000770742  0.887545566130  0.922954785861  0.961725754674  1.068269412135  1.103263092985  1.129187521849  1.162347897592 
            90%             91%             92%             93%             94%             95%             96%             97%             98% 
 1.181065077514  1.221440863082  1.364627083543  1.435300882891  1.468328439976  1.515533782755  1.587171348445  1.606834375029  1.984244133943 
            99%            100% 
 2.174901731264  2.401617760505 

it would be quantile 18 (or 19 depending how you see things)

Was it helpful?

Solution

I'd use findInterval():

findInterval(x,q)
#   [1]  19  52  13  97  56  14  66  78  70  32  95  62  20   1  88  44  46  85
#  [19]  82  71  84  81  50   2  74  42  36   5  26  64  92  40  61  43   6  29
#  [37]  30  41  87  79  35  34  76  67  18  17  59  80  39  83  63  21  58  10
#  [55]  93  98  31  11  69  38 101  45  75  48  15  53   3  94  51  99  65  16
#  [73]  73  12   8  55  28  47  49  22  24  37  90   4  72  57  86  33  60  54
#  [91]  25  91  89  77  96  68   7  23   9  27

OTHER TIPS

How about:

as.numeric(cut(x,q))
##   [1]  19  52  13  97  56  14  66  78  70  32  95  62  20  NA  88  44  46  85
##  [19]  82  71  84  81  50   2  74  42  36   5  26  64  92  40  61  43   6  29
##  [37]  30  41  87  79  35  34  76  67  18  17  59  80  39  83  63  21  58  10
##  [55]  93  98  31  11  69  38 100  45  75  48  15  53   3  94  51  99  65  16
##  [73]  73  12   8  55  28  47  49  22  24  37  90   4  72  57  86  33  60  54
##  [91]  25  91  89  77  96  68   7  23   9  27

The minimum value here is recorded as NA -- you'll need to set include.lowest = TRUE. Default is FALSE.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top