Computing density() in R for grouped frequency data

https://stackoverflow.com/questions/22648634

21-06-2023
|

Question

This should be a pretty straightforward question, but I can't find the answer anywhere (in part because I'm not sure what to query for).

In R, it's easy to compute the density of:

c(1, 2, 2, 2, 3, 5, 5, 7, 8, 10, 10, 10)

You just do:

density(c(1, 2, 2, 2, 3, 5, 5, 7, 8, 10, 10, 10))

The problem is, if I had an "ungrouped" vector like this for my data, it would be far too large for R (or the query engine that builds the dataset) to handle. So I need to use a GROUP BY and COUNT(*) in the initial query to compress my results (as such, using rep() to expand the counts doesn't help). Given such a data frame of 'counts', how do I then compute the density (for a KDE plot) of a frame like:

Value Count
1     1
2     3
3     1
5     2
7     1
8     1
10    3

And just to be clear, I really do need a density plot, not a histogram.

Solution

Just use the weights argument

density(d$Value, weights=d$Count/sum(d$Count))

(edited to account for first comment)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow