R: weird y-axis in frequency/density plot (ggplot2)

https://stackoverflow.com/questions/16735947

30-05-2022
|

Pergunta

I have data from two samples and I want to plot a frequency distribution plot in R. I have the reference done in Excel:

what in want to get in R (obtained with excel)

I uploaded in R the data (HistSerp). It's 136 obs. of 2 variables.

summary(HistSerp)
V1              V2       
 Min.   :0.000   Min.   :0.0000  
1st Qu.:0.000   1st Qu.:0.3752  
Median :0.000   Median :1.2845  
Mean   :0.055   Mean   :1.2144  
3rd Qu.:0.082   3rd Qu.:1.9952  
Max.   :1.082   Max.   :2.9800 

class(HistSerp$V1)
"numeric"
class(HistSerp$V2)
"numeric"

If I HistSerp.m <- melt(HistSerp) and ggplot(HistSerp.m) + geom_freqpoly(aes(x = value, y = ..density.., colour = variable)) the plot looks: enter image description here

I don't know why the y-axis span that values, and I'm not sure if it's only a y-axis labeling problem, the plot itself seems different. I've also tried geom_density() , hist(HistSerp$V1, freq=FALSE), etc. but I can't get it as I expect, I got the same as before. I guess there's something wrong with my data but I can't figure out what is it. Any help will be appreciated.

Thanks

Ps. should I copy the data (136x2)?

Update: The data. Sorry if there's a better way to copy it...

0.144   2.024
0.082   2.548
0.082   1.943
0.000   2.599
0.000   2.233
0.000   2.342
0.082   1.655
0.082   2.200
0.000   2.261
0.000   2.408
0.000   2.127
0.000   2.053
0.000   1.929
0.000   1.413
0.000   2.400
0.000   2.777
0.000   2.685
0.000   1.436
0.000   1.573
0.000   2.504
0.000   1.533
0.000   1.434
0.000   1.421
0.000   2.534
0.082   1.728
0.000   1.984
0.082   1.287
0.000   2.324
0.164   2.405
0.279   1.989
0.082   2.729
0.144   2.046
0.226   2.496
0.000   2.980
0.000   2.634
0.000   1.792
0.000   1.571
0.000   0.612
0.000   0.884
0.000   0.449
0.000   2.318
0.082   0.449
0.000   0.449
0.000   0.563
0.082   0.919
0.000   0.617
0.082   1.297
0.144   0.719
0.000   1.897
0.000   1.338
0.000   0.337
0.000   1.555
0.000   0.273
0.291   0.656
0.000   0.273
0.082   0.388
0.082   1.911
0.082   0.852
0.000   1.580
0.000   1.450
0.000   1.209
0.000   2.049
0.082   2.694
0.082   1.089
0.246   2.643
0.000   2.393
0.000   1.702
0.000   2.595
0.000   1.432
0.000   2.094
0.000   1.526
0.082   1.775
0.000   0.273
0.000   1.405
0.000   2.014
0.000   0.543
0.000   0.586
0.000   1.224
0.000   0.719
0.164   0.201
0.000   0.388
0.082   0.232
0.000   0.116
0.000   0.116
0.082   1.395
0.000   0.116
0.000   0.232
0.082   0.844
0.000   1.153
0.082   0.000
0.667   0.000
0.000   1.535
0.000   2.687
0.000   0.922
0.226   0.337
0.197   0.999
1.082   1.373
0.082   0.396
0.082   0.116
0.000   1.667
0.000   0.731
0.000   0.544
0.082   2.072
0.000   2.262
0.164   2.111
0.082   1.675
0.000   0.116
0.000   0.232
0.082   0.116
0.000   1.004
0.000   0.116
0.164   0.116
0.082   0.699
0.000   0.000
0.000   0.273
0.082   0.000
0.000   0.388
0.082   0.000
0.000   0.116
0.000   0.273
0.000   0.000
0.000   0.649
0.164   0.000
0.082   0.000
0.082   0.000
0.000   0.000
0.082   0.000
0.144   1.282
0.000   1.772
0.000   0.116
0.082   0.000
0.000   1.416
0.000   0.563
0.082   0.510
0.000   0.316
0.164   1.124

Solução

You have a couple of options:

geom_freqpoly(aes(y = ..count.. / sum(..count..)))

which is probably what you want. Then there's:

geom_freqpoly(aes(y = ..ndensity..))

which is the density estimate, but scaled to range from 0 to 1. (i.e. it will always range from 0 to 1). And finally, the associated:

geom_freqpoly(aes(y = ..ncount..))

which is similar, but for the counts. You can read about the options at ?stat_bin.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow