I'm quite new to R, sorry if the programming looks bad.

Goal is to create filenames based on a common prefix, i.e. given prefix loop x times to produce prefix-1, prefix-2, prefix-3. And then use these filenames to read.csv(prefix-1,prefix-2, prefix-3).

I've gotten the code to work, but very inefficiently below:

name <- vector(mode="character", length=0)
for (i in 1:numruns)name[i] <- paste(prefix, "-", i, ".log", sep="")

if (numruns == 1) {
        raw_data_1 <-read.csv(name[1], header=F, sep="\t", skip=11)
}

if (numruns == 2) {
        raw_data_1 <-read.csv(name[1], header=F, sep="\t", skip=11)
        raw_data_2 <-read.csv(name[2], header=F, sep="\t", skip=11)
}

if (numruns == 3) {
        raw_data_1 <-read.csv(name[1], header=F, sep="\t", skip=11)
        raw_data_2 <-read.csv(name[2], header=F, sep="\t", skip=11)
        raw_data_3 <-read.csv(name[3], header=F, sep="\t", skip=11)     #import files
}

I'm trying to learn how to be more efficient, above works for my purposes but I feel like I should be able wrap it up in the initial loop that produces the names. When I try to modify the original loop I can't get it to work...

for (i in 1:numruns){
        name[i] <- paste(prefix, "-", i, ".log", sep="")
        raw_data <- paste("raw_data_", i, sep="")
        print(raw_data)
        raw_data <- read.csv(name[i], header=F, sep="\t", skip=11)
}

Rather than get raw_data_1,raw_data_2,raw_data_3... I get "raw_data". I'm confused because print(raw_data) actually prints "raw_data_1-3" correctly (but only "raw_data" actually contains any information).

Thanks for any help or critique on my code to make it more efficient.

有帮助吗?

解决方案

You should start using native vectorization early on. It may be confusing at first, but eventually you'll see all its' power and beauty. Notice that many base functions are vectorized, so that looping over arguments is often redundant (see paste usage below). Learn more about apply family, it is an essential tool right from the start (see lapply call).

Since reading multiple files is a common task, here's the chain I frequently use. We build all file names first according to a known pattern. Then we read them all at once, without any loops whatsoever. Finally, we may want to combine a list of files into a single data frame.

n <- 4
prefix <- 'some_prefix'
file_names <- paste0(prefix, '-', seq_len(n), '.log')
#[1] "some_prefix-1.log" "some_prefix-2.log" "some_prefix-3.log" "some_prefix-4.log"
# a list of data frames
df_list <- lapply(file_names, function(x) read.csv(x, head=F, sep='\t', skip=11))
# total data frame (if all data frames are compatible)
df_total <- do.call(cbind, df_list)

其他提示

One way to do this is to put them in a list along the lines of:

raw_data <- vector(mode = "list", length = numruns) #allocate space for list
for (i in 1:numruns){ raw_data[[i]] <- read.csv(name[i], header=F, sep="\t", skip=11)}

you can use lapply do do this in one command instead - might be worth reading up for the future.

The reason that your code isn't working is that you're assigning the string "raw_data_1" to raw_data, and then overwriting it with the data from the file. If you really want to go down the route of having lots of variables, have a look at assign() and get().

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top