Pergunta

I have the following files in my folder:

Sim_zone_1_TEMP_cell_1_5.ffData
Sim_zone_1_TEMP_cell_1_5.RData
Sim_zone_338_TEMP_cell_338.ffData
Sim_zone_338_TEMP_cell_338.RData

I also have the following vector of cells:

cell <- c(1,5,338)

I want to open the files where my number of cell is matching with the name. For example, for the cells 1 and 5 I want to have:

ffload("Sim_zone_1_TEMP_cell_1_5")

And for the cell 338:

ffload("Sim_zone_338_TEMP_cell_338")

I tried with the following code:

for (i in 1:length(cell) {
  list.files(path = results_wd, 
             pattern=paste("TEMP_cell_",cell[i],"_",sep=""))
  }

It works for the cell number 1 but not for the number 5 (because I have cell_1_5 and not cell_5). I can't only use "pattern=paste(" _ ",cell[i]," _ ",sep="")" because I can have the same number after "zone". For the example, it is TEMP but it can be something else.

In fact I want two things:

  1. To select the filename where _ cell[i] _ appears anywhere AFTER "cell"
  2. Once I have the name I want to use ffload(Sim_zone_X_TEMP_cell_X_X) so remove the ".ffdata" or the ".RData" from the name

Can someone help me?

Foi útil?

Solução

I first get the file names (here I just wrote them into a vector, but you can use list.files), then I strip the extension using gsub(), then I loop through cell and match match file names accordingly using grep().

I added a few extra elements to f.names (the test file names) to try to make sure the matching wouldn't fail for some circumstances I thought you would be likely to encounter (e.g., 338 should not match cell_3381).

The basic logic to the matching is to find the number in cell[i] in the file name. The number in cell[i] is defined as being the matching sequence of numeric characters (e.g., 338) in a file name that 1) are (not immediately) preceded by "cell", 2) immediately preceded by a "_", and 3) not followed by any numbers

f.names <- c("Sim_zone_1_TEMP_cell_1_5.ffData", "Sim_zone_1_TEMP_cell_1_5.RData", "Sim_zone_338_TEMP_cell_338.ffData", "Sim_zone_338_TEMP_cell_338.RData", "Sim_zone_1_TEMP_5_cell_1.RData", "Sim_zone_338_TEMP_cell_338_1.RData", "Sim_zone_338_TEMP_cell_3381.RData", "Sim_zone_338_TEMP_cell_1338.RData", "Sim_zone_338_TEMP_cell_133811.RData") # example file names
# f.names <- list.files(path=results_wd) # to define f.names based on directory contents, use this line instead
f.names.noExt <- gsub("\\.(?:ff|R)Data$", "", f.names, perl=TRUE) # remove extension from file names

cell <- c(1,5,338) # "cell" values through which to cycle

stored.matches <- list() # this will store matching file names (sans extension), each element of the list will contain a vector of names
for(i in 1:length(cell)){
    t.cell <- cell[i] # temporary cell value
    t.pattern <- paste("(?<=cell).*_", t.cell, "(?![0-9])", sep="") # temporary pattern based on t.cell
    t.matches <- grep(t.pattern, f.names.noExt, perl=TRUE, value=TRUE) # temporary matches
    stored.matches[[as.character(t.cell)]] <- t.matches # store the matched names in a list 
}

print(stored.matches)
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top