all. I have a csv file where I've arranged DNA sample IDs that I sent out to be sequenced in a 96-well plate. This is important to keep track of because when we get the plate back from the sequencing facility the chromatogram files are titled simply, e.g. 5-3-13-G-Templates_A01_Primer-G.ab1.
The csv is tab-delimited and looks like this: (96 wells, 12 columns [1-12], 8 rows[A-H]):
1 2 3 4 5 6 7 8 9 10 11 12
A01 A02 A03 A04 A05_Grammatophyllum_scriptum_ITS1 A06_Eulophia_euglossa_ITS1 A07_Grammatophyllum_scriptum_17SE A08_Graphorkis_lurida_X502F A09_Cymbidium_kanran_X502F A10_Claderia_viridiflora_X502F A11_Grammatophyllum_scriptum_X502F A12_Eulophia_euglossa_X502F
B01 B02 B03 B04 B05_Grammatophyllum_scriptum_ITS4 B06_Eulophia_euglossa_ITS4 B07_Grammatophyllum_scriptum_1229R B08_Graphorkis_lurida_X1599R B09_Cymbidium_kanran_X1599R B10_Claderia_viridiflora_X1599R B11_Grammatophyllum_scriptum_X1599R B12_Eulophia_euglossa_X1599R
C01 C02 C03 C04 C05_Acriopsis_ridleyi_ITS1 C06_Cyrtopodium_polyphyllum_ITS1 C07_Cyrtopodium_polyphyllum_17SE C08_Graphorkis_scripta_X502F C09_Dipodium_conduplicatum_X502F C10_Dipodium_5431_X502F C11_Cyrtopodium_polyphyllum_X502F C12_Oeceoclades_gracillima_X502F
D01 D02 D03 D04 D05_Acriopsis_ridleyi_641R D06_Cyrtopodium_polyphyllum_ITS4 D07_Cyrtopodium_polyphyllum_1229R D08_Graphorkis_scripta_X1599R D09_Dipodium_conduplicatum_X1599R D10_Dipodium_5431_X1599R D11_Cyrtopodium_polyphyllum_X1599R D12_Oeceoclades_gracillima_X1599R
E01 E02 E03 E04_Dipodium_6052_ITS1 E05_Dipodium_5431_ITS1 E06_Bromheadia_finlaysoniana_ITS1 E07_Dressleria_dilecta_X502F E08_Cyrtopodium_falciobum_X502F E09_Acriopsis_ridleyi_X502F E10_Dipodium_6052_X502F E11_Thecostele_alata_28_X502F E12_Thecostele_alata_32_X502F
F01 F02 F03 F04_Dipodium_6052_ITS4 F05_Dipodium_5431_ITS4 F06_Bromheadia_finlaysoniana_641R F07_Dressleria_dilecta_X1599R F08_Cyrtopodium_falciobum_X1599R F09_Acriopsis_ridleyi_X1599R F10_Dipodium_6052_X1599R F11_Thecostele_alata_28_X1599R F12_Thecostele_alata_32_X1599R
G01 G02 G03 G04_Dipodium_6055_ITS1 G05_Dipodium_conduplicatum_ITS1 G06_Claderia_viridiflora_ITS1 G07_Ansellia_africana_X502F G08_Grammangis_ellisii_X502F G09_Bromheadia_finlaysoniana_X502F G10_Dipodium_6055_X502F G11_Grammatophyllum_stapeliiflorum_X502F G12
H01 H02 H03 H04_Dipodium_6055_ITS4 H05_Dipodium_conduplicatum_ITS4 H06_Claderia_viridiflora_641R H07_Ansellia_africana_X1599R H08_Grammangis_ellisii_X1599R H09_Bromheadia_finlaysoniana_X1599R H10_Dipodium_6055_X1599R H11_Grammatophyllum_stapeliiflorum_X1599R H12
Instead of taking the time to rename 96 files every time I get a plate back by hand, I'm trying to take this file that I've already prepared in advance to guide me in loading the plate so I don't screw it up (wrong DNA in the wrong well), identify the position by the prefix (e.g. A06... H06), match it to the file names in a directory since they share the same cell location such that the script will iterate over the entire csv file and rename all of the files in the form: 5-3-13-G-Templates_A06_Primer-G.ab1 will become A06_Eulophia_euglossa_ITS1.ab1
I've written part of the Python script but I'm having difficulty envisioning the next step:
import csv
data = csv.DictReader(open('Template.csv', 'rU'), delimiter='\t')
for row in data:
values = row.values()
values.sort()
#Provides values by row in order from left to right
This is where I'm stuck. What do I do next now that I have these lists? For loops? I'm just having problems envisioning the solution.
I suppose a part of the solution would be a bit of the following code, modified from another answer I found:
folder = r"/home/ryan/Desktop/MMEE/plateG" #Make sure only the .ab1 files are in this directory
import os
for root, dirs, filenames in os.walk(folder):
for filename in filenames:
fullpath = os.path.join(root, filename)
filename_split = os.path.splitext(fullpath)
filename_zero, fileext = filename_split
os.rename(fullpath, SOMEVARIABLE + fileext)
The part above where I rename the file with os.rename and use "SOMEVARIABLE" where I think the name from the list above should be fed in to the file name. But how to get it there is beyond my skill level at the moment. Or maybe I'm just tired.
Any help would be appreciated. I hope this is sufficiently clear but I can provide clarification if necessary. Cheers!
Edited to add:
The old filename and new filename only share the location ID, e.g. A01, B06, H12. The new filenames will be taken from the csv file so that a file named 5-3-13-G-Templates_F08_Primer-G.ab1 will pull the name from column 8, but only the one with "F08" in the title. The rows are A through H. Essentially I want to pick out the text from the location RowF,Column8 (though I don't have row headings at the moment) and apply that text to the filename with F08 in it. I thought there might be a way to match each substring A01 through H12 from the generated values list and pull the text from each one into the old filename that they're replacing since they also match with the same substrings A01 through H12.
I want the files renamed this way: (NB - A01 to D04 were blank wells so they have no other label than the ID)
5-3-13-G-Templates_E04_Primer-G.ab1 > E04_Dipodium_6052_ITS1.ab1
5-3-13-G-Templates_F04_Primer-G.ab1 > F04_Dipodium_6052_ITS4.ab1
5-3-13-G-Templates_G04_Primer-G.ab1 > G04_Dipodium_6055_ITS1.ab1
5-3-13-G-Templates_H04_Primer-G.ab1 > H04_Dipodium_6055_ITS4.ab1
5-3-13-G-Templates_A05_Primer-G.ab1 > A05_Grammatophyllum_scriptum_ITS1.ab1
5-3-13-G-Templates_B05_Primer-G.ab1 > B05_Grammatophyllum_scriptum_ITS4.ab1
...