Ranked Frequency Distributions from Nominal Variables in R

https://stackoverflow.com/questions/15490663

24-03-2022
|

Pergunta

I have searched through the website but have been unable to find a solution to my problem. I have a sample dataset as follows:

id,l1
1,3
2,5
3,6
1,5
2,4
3,6

id is a nominal variable and represents a unique user and the other is a count variable.

What I want is to find out the distribution of l1 by user. So, looking at my given dataset, id=1 has total l1 = 8; id = 2 has total l1 = 9 and id=3 has total l1 = 12.

I am trying to find out the distribution of l1 according to id but I am stuck. I cannot figure out how to group the relevant columns together and then find the distribution or at least construct a histogram. I can construct a histogram with one variable but I cannot construct a ranked frequency distribution by a nominal variable.

Solução

The base R approach would be to use tapply

If your data.frame was called aa

sumById <- with(aa, tapply(l1,id, sum)))

barplot(sumById)

enter image description here

If you wanted to plot your results without explicitly presumarizing, then you could use ggplot2 and stat_summary

library(ggplot2)
ggplot(aa, aes(x = id, y = l1)) + stat_summary(fun.y = 'sum', geom = 'bar')

enter image description here

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow