Pregunta

General situation: I am currently trying to name dataframes inside a list in accordance to the csv files they have been retrieved from, I found that using gsub and regex is the way to go. Unfortunately, I can’t produce exactly what I need, just sort of. I would be very grateful for some hints from someone more experienced, maybe there is a reasonable R regex cheat cheet ?

File are named r2_m1_enzyme.csv, the script should use the first 4 characters to name the corresponding dataframe r2_m1, and so on…

# generates a list of dataframes, to mimic a lapply(f,read.csv) output:
data <- list(data.frame(c(1,2)),data.frame(c(1,2)),data.frame(c(1,2)),data.frame(c(1,2)))

# this mimics file names obtained by  list.files() function
f <-c("r1_m1_enzyme.csv","r2_m1_enzyme.csv","r1_m2_enzyme.csv","r2_m2_enzyme.csv")

# this should name the data frames according to the csv file they have been derived from
names(data) <- gsub("r*_m*_.*","\\1", f)

but it doesnt work as expected... they are named r2_m1_enzyme.csv instead of the desired r2_m1, although .* should stop it?

If I do:

names(data) <- gsub("r*_.*","\\1", f)

I do get r1, r2, r3 ... but I am missing my second index.

The question: So my questions is, what regex expression would allow me to obtain strings “r1_m1”, “r2_m1”, “r1_m2”, ... from strings that are are named r*_m*_xyz.csv

Search history: R regex use * for only one character, Gsub regex replacement, R ussing parts of filename to name dataframe, R regex cheat sheet,...

¿Fue útil?

Solución

If your names are always five characters long you could use substr:

substr(f, 1, 5)

If you want to use gsub you have to group your expression (via ( and )) because \\1 refers to the first group and insert its content, e.g.:

gsub("^(r[0-9]+_m[0-9]+).*", "\\1", f)
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top