Question

I am facing a structure reorganization of data but unfortunately I am stuck.

I read into R let´s say a number of txt files (each has a specific name), all of them structured (data frame) in the same way with a "Gene" and 4 conditions "A1", "A2, "A3" and "A4" and their respective values:

Gene A1 A2 A3 A4  
Gene1 value1.1 value1.2 value1.3 value1.4
Gene2 value2.1 value2.2 value2.3 value2.4
Gene3 value3.1 value3.2 value3.3 value3.4
...

But each file that is read into R has a different filename (filename1, filename2, filename3,...).

I want to reorganize data from all these files into one single data file with the following structure:

    id Gene1_A1 Gene1_A2 Gene1_A3 Gene1_A4 Gene2_A1 Gene2_A2 Gene2_A3 Gene2_A4 Gene3_A1 Gene3_A2 Gene3_A3 Gene3_A4 ...
   filename1 value1.1 value1.2 value1.3 value1.4 value2.1 value2.2 value2.3 value2.4 value3.1 value3.2 value3.3 value3.4
    filename2
    filename3
    ...

In words, data from Gene2 should follow after the end of data from Gene1, then Gene3 and so on. Each row is then representing an id (meaning every txt-filename). Headers of the output-table is a concatenation of the "Gene"-name (Gene1, Gene2, Gene3, ...) and a condition (A1, A2, A3, A4).

Could anyone give me an advise how I could solve this? Many thanks in advance Kind Regards s

Was it helpful?

Solution

sample.table.text <- "Gene A1 A2 A3 A4  
Gene1 value1.1 value1.2 value1.3 value1.4
Gene2 value2.1 value2.2 value2.3 value2.4
Gene3 value3.1 value3.2 value3.3 value3.4"

# create some sample files
files <- replicate(2, tempfile())
for (f in files) write(sample.table.text, f)

# read and reshape
dat <- lapply(files, function(fname) {
    x <- read.table(fname, header=TRUE)
    x['id'] <- basename(fname)
    reshape(x, idvar='id', timevar='Gene', direction='wide')
})
# collapse into one data.frame
result <- do.call(rbind, dat)
result
#                id A1.Gene1 A2.Gene1 A3.Gene1 A4.Gene1 A1.Gene2 A2.Gene2 A3.Gene2 A4.Gene2 A1.Gene3 A2.Gene3 A3.Gene3 A4.Gene3
# 1 file848632b4675 value1.1 value1.2 value1.3 value1.4 value2.1 value2.2 value2.3 value2.4 value3.1 value3.2 value3.3 value3.4
# 2 file84864ad4a6c value1.1 value1.2 value1.3 value1.4 value2.1 value2.2 value2.3 value2.4 value3.1 value3.2 value3.3 value3.4

# remove temp files
for (f in files) unlink(f)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top