Question

I am trying to write code to go from an input file of this type

dput(input)
c("A\t8213", "B\tAnytown", "C\tAAA", "D\t19", "E\t19", "F\tAny ID", 
"G\t0", "H\t0", "I\t0", "J\t0", "K\t0", "L\t0", "M\t0", "N\t0.048", 
"O\t0.303", "P\t31", "Q\t0", "R\t-0.114", "S\t0.377", "T\t-5.833"
)

to an output file of this type (once the code is verified for one file, then it will be used in a function to process hundreds of files):

dput(output)
c("A\tB\tC\tD\tE\tF\tG\tH\tI\tJ\tK\tL\tM\tN\tO\tP\tQ\tR\tS\tT", 
"8213\tAnytown\tAAA\t19\t19\tAny  
ID\t0\t0\t0\t0\t0\t0\t0\t0.048\t0.303\t31\t0\t-0.114\t0.377\t-5.833", 
"")

I only want the rows where there is no NA in each column.

This is the code that I have written thus far (I am thankful for many useful code snippets at StackOverflow and R help mailing lists for the following revised code)

library(data.table)
inputtmp <- data.table(read.table(textConnection(input), sep = "\t",     
stringsAsFactors = FALSE))
inputtmp[, id:=1:length(inputtmp[[1]])]
inputtmp <- dcast.data.table(inputtmp, id~V1, value.var="V2")
varcols <- colnames(inputtmp)

Questions:

1) Is there a better way to "transpose" the rows to columns so that other steps are not needed to remove the NAs?

2) If not, then how can I remove only the NAs from each column?

I have tried different revisions of code presented in the following 2 links, but nothing has worked in my case.

1) Fastest way to drop rows with missing values?

and

2) Apply over rows of data.table: find rows where a subset of columns are all NA

Thank you.

Was it helpful?

Solution

Would that work for you?

input <- c("A\t8213", "B\tAnytown", "C\tAAA", "D\t19", "E\t19", "F\tAny ID", 
           "G\t0", "H\t0", "I\t0", "J\t0", "K\t0", "L\t0", "M\t0", "N\t0.048", 
           "O\t0.303", "P\t31", "Q\t0", "R\t-0.114", "S\t0.377", "T\t-5.833")
inputtmp <- read.table(textConnection(input), sep = "\t", stringsAsFactors = FALSE)
rownames(inputtmp) <- as.character(inputtmp[, 1])
inputtmp <- as.data.frame(t(inputtmp))
library(data.table)
inputtmp <- data.table(inputtmp[-1, ])
inputtmp
#       A       B   C  D  E      F G H I J K L M     N     O  P Q      R     S      T
# 1: 8213 Anytown AAA 19 19 Any ID 0 0 0 0 0 0 0 0.048 0.303 31 0 -0.114 0.377 -5.833
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top