在R A组中选择具有可变的最大值的行

https://stackoverflow.com/questions/2822156

26-09-2019
|

题

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

r<-sapply(split(a.3,a.2),function(x) which.max(x$b.2))

a.3[r,]

返回列表索引，而不是针对整个data.frame索引

林试图返回b.2的最大值为a.2的每个子组。我怎样才能做到这一点有效？

解决方案

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

由乔纳森·常答案让你你明确地要求，但我猜你想从数据帧中的实际行。

sel <- ave(b.2, a.2, FUN = max) == b.2
a.3[sel,]

其他提示

在ddply和ave方法都相当耗费资源，我想。 ave由我当前问题的内存不多了失败（67608列，四列定义唯一键）。 tapply是一个方便的选择，但我通常需要做的是选择所有与一些-EST为每一个独特的按键（通常由多个列定义的）一些价值完整行。我已经找到了最好的解决办法是做一个排序，然后用duplicated的否定，只选择第一行的每个唯一键。对于这里的简单示例：

a <- sample(1:10,100,replace=T)
b <- sample(1:100,100,replace=T)
f <- data.frame(a, b)

sorted <- f[order(f$a, -f$b),]
highs <- sorted[!duplicated(sorted$a),]

我认为随着ave或ddply的性能增益，至少是巨大的。这是稍微复杂的多列键，但order将处理一大堆的事情进行排序和数据帧duplicated作品，因此有可能继续使用这种方法。

library(plyr)
ddply(a.3, "a.2", subset, b.2 == max(b.2))

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
m<-split(a.3,a.2)
u<-function(x){
    a<-rownames(x)
    b<-which.max(x[,2])
    as.numeric(a[b])
    }
r<-sapply(m,FUN=function(x) u(x))

a.3[r,]

这是卓有成效的，虽然有点麻烦......但它可以让我抢行的GroupWise的最大值。任何其他的想法？

> a.2<-sample(1:10,100,replace=T)
> b.2<-sample(1:100,100,replace=T)
> tapply(b.2, a.2, max)
 1  2  3  4  5  6  7  8  9 10 
99 92 96 97 98 99 94 98 98 96

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

使用aggregate，就可以得到各组的最大在一行：

aggregate(a.3, by = list(a.3$a.2), FUN = max)

这产生以下输出：

   Group.1 a.2 b.2
1        1   1  96
2        2   2  82
...
8        8   8  85
9        9   9  93
10      10  10  97

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow