Question

I have this kind of data frame :

df<- data.frame(cluster=c('1','1','2','3','3','3'), class=c('A','B','C','B','B','C'))

I would like to get for each cluster (1,2,3), the class which appears the most often. In case of a tie, it would also be great to get an info, as for example the combination of the classes (or if not possible just have NA). So for my example, I would like to have something like this as result:

 cluster  class.max
   1        'A B' (or NA)
   2         'C'
   3         'B'

Maybe I should use aggregate() but don't know how.

Was it helpful?

Solution

rank has ways of dealing with ties:

aggregate(class~cluster,df,function(x) paste(names(table(x)[rank(-1*table(x),ties.method="min")==1]),collapse=" "))
  cluster class
1       1   A B
2       2     C
3       3     B
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top