通过列表的子集的data.frame并在各部分应用功能，通过行

https://stackoverflow.com/questions/2351204

23-09-2019
|

题

这似乎是一个典型的plyr问题，但我有脑海中还有不同的。这里的功能，我希望优化（跳过for循环）。

# dummy data
set.seed(1985)
lst <- list(a=1:10, b=11:15, c=16:20)
m <- matrix(round(runif(200, 1, 7)), 10)
m <- as.data.frame(m)


dfsub <- function(dt, lst, fun) {
    # check whether dt is `data.frame`
    stopifnot (is.data.frame(dt))
    # check if vectors in lst are "whole" / integer
    # vector elements should be column indexes
    is.wholenumber <- function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x)) < tol
    # fall if any non-integers in list
    idx <- rapply(lst, is.wholenumber)
    stopifnot(idx)
    # check for list length
    stopifnot(ncol(dt) == length(idx))
    # subset the data
    subs <- list()
    for (i in 1:length(lst)) {
            # apply function on each part, by row
            subs[[i]] <- apply(dt[ , lst[[i]]], 1, fun)
    }
    # preserve names
    names(subs) <- names(lst)
    # convert to data.frame
    subs <- as.data.frame(subs)
    # guess what =)
    return(subs)
}

现在一个简短的演示......其实，我要解释什么，我主要是用来做的。我想子组聚集在data.frame对象矢量list。由于这是的代码，伴随着数据处理心理学研究的功能的一部分，你可以考虑m从人格问卷的结果（10个科目，20瓦尔）。矢量定义问卷量表（如个性特征）列表保持列索引。各分量表是由几个项目（data.frame列）定义。如果我们先假定每个分量表得分行值（在每个主题问卷调查的部分结果）的不外乎sum（或其他功能），你可以运行：

> dfsub(m, lst, sum)
    a  b  c
1  46 20 24
2  41 24 21
3  41 13 12
4  37 14 18
5  57 18 25
6  27 18 18
7  28 17 20
8  31 18 23
9  38 14 15
10 41 14 22

我浏览这个功能，我必须承认这一点循环不损害在所有的代码......可是，如果有这样做的更简单/高效的方式，请让我知道！

解决方案

我会采取不同的方法，并保留一切数据帧，这样就可以使用合并和ddply。我想你会发现这个方法稍微更普遍，而且它更容易检查每个步骤的正确执行。

# Convert everything to long data frames
m$id <- 1:nrow(m)

library(reshape)
obs <- melt(m, id = "id")
obs$variable <- as.numeric(gsub("V", "", obs$variable))

varinfo <- melt(lst)
names(varinfo) <- c("variable", "scale")

# Merge and summarise
obs <- merge(obs, varinfo, by = "variable")

ddply(obs, c("id", "scale"), summarise, 
  mean = mean(value), 
  sum = sum(value))

其他提示

加载plyr包后，替换

subs <- list()
    for (i in 1:length(lst)) {
            # apply function on each part, by row
            subs[[i]] <- apply(dt[ , lst[[i]]], 1, fun)
    }

与

subs <- llply(lst,function(x) apply(dt[,x],1,fun))

@Hadley，我检查你的反应，因为它是非常简单和容易的簿记（除了事实上它更通用的解决方案）。但是，这是我不那么长的脚本，做的东西，只需要base包（因为我刚刚安装后R安装plyr和reshape这是微不足道的）。现在，这里的源：

dfsub <- function(dt, lst, fun) {
        # check whether dt is `data.frame`
        stopifnot (is.data.frame(dt))
        # convert data.frame factors to numeric
        dt <- as.data.frame(lapply(dt, as.numeric))
        # check if vectors in lst are "whole" / integer
        # vector elements should be column indexes
        is.wholenumber <- function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x)) < tol
        # fall if any non-integers in list
        idx <- rapply(lst, is.wholenumber)
        stopifnot(idx)
        # check for list length
        stopifnot(ncol(dt) == length(idx))
        # subset the data
        subs <- list()
        for (i in 1:length(lst)) {
                # apply function on each part, by row
                subs[[i]] <- apply(dt[ , lst[[i]]], 1, fun)
        }
        names(subs) <- names(lst)
        # convert to data.frame
        subs <- as.data.frame(subs)
        # guess what =)
        return(subs)
}

有关您的具体的例子，一个线的解决方案是sapply(lst,function(x) rowSums(m[,x]))（虽然你可能会添加一些行来检查有效的输入，并把列名）。

你有其他的，更普遍的，考虑到应用程序？或者这可能 YAGNI ？

的情况下

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow