Domanda

I am trying to condense a data.frame that has the same column multiple times. Columns to be condensed have logical values.

The data.frame looks like this:

mydf <- data.frame (ID = c("1A", "2A", "3A", "1B", "2B", "3B"),
                A = c("N1", "N2", "N3", "N4", "N5", "N6"),
                AA = c(T, T, F, F, F, F),
                BB = c(T, T, F, F, F, F),
                AA = c(T, F, T, F, F, F),
                CC = c(T, F, T, F, T, F),
                DD = c(T, F, T, F, T, T),
                AA = c(F, F, F, F, T, F),
                EE = c(F, F, T, T, T, F),
                AA = c(F, F, F, F, F, F), check.names = FALSE)

I want to condense AA in a way that the condense column is set to TRUE if all the AA columns in one row are set to TRUE a least once. For example, in row 1A the AA columns have a sequence of TRUE, TRUE, FALSE, FALSE. This means the condense column, lets call it ZZ, should have TRUE in row 1A but FALSE in row 3B.

The desired output looks like this:

mydfnew <- data.frame (ID = c("1A", "2A", "3A", "1B", "2B", "3B"),
                A = c("N1", "N2", "N3", "N4", "N5", "N6"),
                AA = c(T, T, T, F, T, F),
                BB = c(T, T, F, F, F, F),
                CC = c(T, F, T, F, T, F),
                DD = c(T, F, T, F, T, T),
                EE = c(F, F, T, T, T, F))

The AA columns are replace by the condensed ZZ column which is once again called AA. I do now know how the AA columns are called and there are multiple of such "duplicate" columns. I hope this makes sense.

Any help and pointers would be greatly appreciated.

È stato utile?

Soluzione 2

ding ding ding!

l <- sapply(df, is.logical)

cbind(df[!l], lapply(split(as.list(df[l]), names(df)[l]), Reduce, f = `|`))

Altri suggerimenti

A solution for all columns (except the first two):

res <- tapply(names(mydf)[-(1:2)], names(mydf)[-(1:2)], FUN = function(n)
        as.logical(rowSums(mydf[names(mydf) %in% n[1]]))) 

cbind(mydf[1:2], do.call(cbind, res))


  ID  A    AA    BB    CC    DD    EE
1 1A N1  TRUE  TRUE  TRUE  TRUE FALSE
2 2A N2  TRUE  TRUE FALSE FALSE FALSE
3 3A N3  TRUE FALSE  TRUE  TRUE  TRUE
4 1B N4 FALSE FALSE FALSE FALSE  TRUE
5 2B N5  TRUE FALSE  TRUE  TRUE  TRUE
6 3B N6 FALSE FALSE FALSE  TRUE FALSE

As a start:

rowSums(mydf[,colnames(mydf) == 'AA']) > 0

Essentially a variation on @SvenHohenstein's solution:

unq <- unique(names(mydf)[-(1:2)])
res <- setNames(lapply(unq, function(x) rowSums(mydf[names(mydf)==x])>0 ),unq)
cbind(mydf[1:2],res)

#  ID  A    AA    BB    CC    DD    EE
#1 1A N1  TRUE  TRUE  TRUE  TRUE FALSE
#2 2A N2  TRUE  TRUE FALSE FALSE FALSE
#3 3A N3  TRUE FALSE  TRUE  TRUE  TRUE
#4 1B N4 FALSE FALSE FALSE FALSE  TRUE
#5 2B N5  TRUE FALSE  TRUE  TRUE  TRUE
#6 3B N6 FALSE FALSE FALSE  TRUE FALSE

I thought this was going to be real straightforward, but it turns out melt doesn't do great when you have repeated column names, so this gets a bit finicky:

library(data.table)
library(reshape2)
df.names <- names(mydf)
var.names <- paste0("V", 1:(length(df.names) - 2))
real.names <- df.names[-(1:2)]
names(mydf) <- c(df.names[1:2], var.names)
dt <- data.table(melt(mydf, id.vars=c("ID", "A")))
dt[, variable:=real.names[match(variable, var.names)]]
dcast(
  dt[, list(value=any(value)), by=list(ID, A, variable)], 
  ID + A ~ variable
)
#   ID  A    AA    BB    CC    DD    EE
# 1 1A N1  TRUE  TRUE  TRUE  TRUE FALSE
# 2 1B N4 FALSE FALSE FALSE FALSE  TRUE
# 3 2A N2  TRUE  TRUE FALSE FALSE FALSE
# 4 2B N5  TRUE FALSE  TRUE  TRUE  TRUE
# 5 3A N3  TRUE FALSE  TRUE  TRUE  TRUE
# 6 3B N6 FALSE FALSE FALSE  TRUE FALSE    

Note result set is not in exact same order as yours, but it should be easy to re-order if it matters. Note I think N4 is wrong in your desired output.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top