سؤال

I'm creating a script to read a csv file into a set of named tuples from their column headers. I will then use these namedtuples to pull out rows of data which meet certain criteria.

I've worked out the input (shown below), but am having issues with filtering the data before outputting it to another file.

import csv
from collections import namedtuple

with open('test_data.csv') as f:
    f_csv = csv.reader(f) #read using csv.reader()
    Base = namedtuple('Base', next(f_csv)) #create namedtuple keys from header row
    for r in f_csv: #for each row in the file
        row = Base(*r) 
        # Process row
        print(row) #print data

The contents of my input file are as follows:

Locus           Total_Depth     Average_Depth_sample    Depth_for_17
chr1:6484996    1030            1030                    1030
chr1:6484997    14              14                      14
chr1:6484998    0               0                       0

And they are printed from my code as follow:

Base(Locus='chr1:6484996', Total_Depth='1030', Average_Depth_sample='1030', Depth_for_17='1030') Base(Locus='chr1:6484997', Total_Depth='14', Average_Depth_sample='14', Depth_for_17='14') Base(Locus='chr1:6484998', Total_Depth='0', Average_Depth_sample='0', Depth_for_17='0')

I want to be able to pull out only the records with a Total_Depth greater than 15.

Intuitively I tried the following function:

if Base.Total_Depth >= 15 :
    print row

However this only prints the final row of data (from the above output table). I think the problem is twofold. As far as I can tell I'm not storing my named tuples anywhere for them to be referenced later. And secondly the numbers are being read in string format rather than as integers.

Firstly can someone correct me if I need to store my namedtuples somewhere.

And secondly how do I convert the string values to integers? Or is this not possible because namedtuples are immutable.

Thanks!

I previously asked a similar question with respect to dictionaries, but now would like to use namedtuples instead. :)

هل كانت مفيدة؟

المحلول

Map your values to int when creating the named tuple instances:

row = Base(r[0], *map(int, r[1:])) 

This keeps the r[0] value as a string, and maps the remaining values to int().

This does require knowledge of the CSV columns as which ones can be converted to integer is hardcoded here.

Demo:

>>> from collections import namedtuple
>>> Base = namedtuple('Base', ['Locus', 'Total_Depth', 'Average_Depth_sample', 'Depth_for_17'])
>>> r = ['chr1:6484996', '1030', '1030', '1030']
>>> Base(r[0], *map(int, r[1:]))
Base(Locus='chr1:6484996', Total_Depth=1030, Average_Depth_sample=1030, Depth_for_17=1030)

Note that you should test against the rows, not the Base class:

if row.Total_Depth >= 15:

within the loop, or in a new loop of collected rows.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top