Calculate unique combinations of values in dataframe, and summary values
-
11-10-2019 - |
Question
I would like to work with unique combinations of var1
and var2
in my dataframe:
foo <- data.frame(var1 = c(1,1,2,2,2,2,3,3,3,3,3,4,4,4,4),
var2 = c(1,1,1,1,2,2,1,1,2,2,2,2,2,3,3))
As has been noted, unique(foo)
results in this:
var1 var2
1 1 1
2 2 1
3 2 2
4 3 1
5 3 2
6 4 2
7 4 3
Based on the unique combinations, how do I get:
n
, the number of occurrences of avar1
value andsvar
, the sum of eachvar1
value'svar2
values.
The output could look like this:
var1 n svar
1 1 1 1
2 2 2 3
3 3 2 3
4 4 2 5
Solution
unique(foo)
should give you what you are after here.
UPDATE 2014: use dplyr
instead of plyr
I recommend looking into the library plyr
for other aggregating type tasks, or the base R equivalents of tapply()
, aggregate()
et al.
While redundant for this exercise, here's how you would use plyr:
library(plyr)
ddply(foo, .(var1), unique)
Note you can replace unique with any number of functions, such as finding the mean and sd of var2 like so:
ddply(foo, .(var1), summarise, mean = mean(var2), sd = sd(var2))
Response to edit
Now you have a more legitimate use for plyr()
. Taking what we learned from above:
x <- unique(foo)
combined with plyr:
ddply(x, .(var1), summarise, n = length(var2), sum = sum(var2))
Should give you what you are after.
OTHER TIPS
I hope I understand your question well, try:
unique(foo)
After question was edited:
Not to write the same as @Chase, a very simple but not too elegant solution could be:
foo$var12 <- paste(foo$var1, foo$var2, sep='|') # the two variables combined to one
table(foo$var12) # and showing its frequencies
And the output is a table of course:
1|1 2|1 2|2 3|1 3|2 4|2 4|3
2 2 2 2 3 2 2
The answers are different than you state, but I trust my code more than I trust your answer, and I cannot bring myself to commit the sin of naming a variable "sum":
newfoo <- data.frame(
var1=unique(foo$var1),
n = with(foo, tapply(var2, var1, length) ),
svar = with(foo, tapply(var2, var1, sum) ) )
newfoo
# var1 n svar
#1 1 2 2
#2 2 4 6
#3 3 5 8
#4 4 4 10
EDIT: (hadn't at first figured out what Chase did try to tell me.)
newfoo <- data.frame(
var1=unique(unique(foo)$var1),
n = with(unique(foo), tapply(var2, var1, length) ),
svar = with(unique(foo), tapply(var2, var1, sum) ) )
> newfoo
var1 n svar
1 1 1 1
2 2 2 3
3 3 2 3
4 4 2 5