ding ding ding!
l <- sapply(df, is.logical)
cbind(df[!l], lapply(split(as.list(df[l]), names(df)[l]), Reduce, f = `|`))
Domanda
I am trying to condense a data.frame
that has the same column multiple times. Columns to be condensed have logical values.
The data.frame
looks like this:
mydf <- data.frame (ID = c("1A", "2A", "3A", "1B", "2B", "3B"),
A = c("N1", "N2", "N3", "N4", "N5", "N6"),
AA = c(T, T, F, F, F, F),
BB = c(T, T, F, F, F, F),
AA = c(T, F, T, F, F, F),
CC = c(T, F, T, F, T, F),
DD = c(T, F, T, F, T, T),
AA = c(F, F, F, F, T, F),
EE = c(F, F, T, T, T, F),
AA = c(F, F, F, F, F, F), check.names = FALSE)
I want to condense AA
in a way that the condense column is set to TRUE
if all the AA
columns in one row are set to TRUE
a least once. For example, in row 1A
the AA
columns have a sequence of TRUE
, TRUE
, FALSE
, FALSE
. This means the condense column, lets call it ZZ, should have TRUE
in row 1A
but FALSE
in row 3B
.
The desired output looks like this:
mydfnew <- data.frame (ID = c("1A", "2A", "3A", "1B", "2B", "3B"),
A = c("N1", "N2", "N3", "N4", "N5", "N6"),
AA = c(T, T, T, F, T, F),
BB = c(T, T, F, F, F, F),
CC = c(T, F, T, F, T, F),
DD = c(T, F, T, F, T, T),
EE = c(F, F, T, T, T, F))
The AA
columns are replace by the condensed ZZ
column which is once again called AA. I do now know how the AA columns are called and there are multiple of such "duplicate" columns. I hope this makes sense.
Any help and pointers would be greatly appreciated.
Soluzione 2
ding ding ding!
l <- sapply(df, is.logical)
cbind(df[!l], lapply(split(as.list(df[l]), names(df)[l]), Reduce, f = `|`))
Altri suggerimenti
A solution for all columns (except the first two):
res <- tapply(names(mydf)[-(1:2)], names(mydf)[-(1:2)], FUN = function(n)
as.logical(rowSums(mydf[names(mydf) %in% n[1]])))
cbind(mydf[1:2], do.call(cbind, res))
ID A AA BB CC DD EE
1 1A N1 TRUE TRUE TRUE TRUE FALSE
2 2A N2 TRUE TRUE FALSE FALSE FALSE
3 3A N3 TRUE FALSE TRUE TRUE TRUE
4 1B N4 FALSE FALSE FALSE FALSE TRUE
5 2B N5 TRUE FALSE TRUE TRUE TRUE
6 3B N6 FALSE FALSE FALSE TRUE FALSE
As a start:
rowSums(mydf[,colnames(mydf) == 'AA']) > 0
Essentially a variation on @SvenHohenstein's solution:
unq <- unique(names(mydf)[-(1:2)])
res <- setNames(lapply(unq, function(x) rowSums(mydf[names(mydf)==x])>0 ),unq)
cbind(mydf[1:2],res)
# ID A AA BB CC DD EE
#1 1A N1 TRUE TRUE TRUE TRUE FALSE
#2 2A N2 TRUE TRUE FALSE FALSE FALSE
#3 3A N3 TRUE FALSE TRUE TRUE TRUE
#4 1B N4 FALSE FALSE FALSE FALSE TRUE
#5 2B N5 TRUE FALSE TRUE TRUE TRUE
#6 3B N6 FALSE FALSE FALSE TRUE FALSE
I thought this was going to be real straightforward, but it turns out melt
doesn't do great when you have repeated column names, so this gets a bit finicky:
library(data.table)
library(reshape2)
df.names <- names(mydf)
var.names <- paste0("V", 1:(length(df.names) - 2))
real.names <- df.names[-(1:2)]
names(mydf) <- c(df.names[1:2], var.names)
dt <- data.table(melt(mydf, id.vars=c("ID", "A")))
dt[, variable:=real.names[match(variable, var.names)]]
dcast(
dt[, list(value=any(value)), by=list(ID, A, variable)],
ID + A ~ variable
)
# ID A AA BB CC DD EE
# 1 1A N1 TRUE TRUE TRUE TRUE FALSE
# 2 1B N4 FALSE FALSE FALSE FALSE TRUE
# 3 2A N2 TRUE TRUE FALSE FALSE FALSE
# 4 2B N5 TRUE FALSE TRUE TRUE TRUE
# 5 3A N3 TRUE FALSE TRUE TRUE TRUE
# 6 3B N6 FALSE FALSE FALSE TRUE FALSE
Note result set is not in exact same order as yours, but it should be easy to re-order if it matters. Note I think N4
is wrong in your desired output.