سؤال

I am new to coding and I ran in trouble while trying to make my own fastq masker. The first module is supposed to trim the line with the + away, modify the sequence header (begins with >) to the line number, while keeping the sequence and quality lines (A,G,C,T line and Unicode score, respectively).

class Import_file(object):

def trim_fastq (self, fastq_file):
    f = open('path_to_file_a', 'a' )
    sanger = []
    sequence = []
    identifier = []
    plus = []
    f2 = open('path_to_file_b')

   for line in f2.readlines():
        line = line.strip()
        if line[0]=='@':
            identifier.append(line)
            identifier.replace('@%s','>[i]' %(line))

        elif line[0]==('A' or 'G'or 'T' or 'U' or 'C'):
            seq = ','.join(line)
            sequence.append(seq)

        elif line[0]=='+'and line[1]=='' :
            plus.append(line)
            remove_line = file.writelines()

        elif line[0]!='@' or line[0]!=('A' or 'G'or 'T' or 'U' or 'C') or line[0]!='+' and line[1]!='':
            sanger.append(line)

        else:
            print("Danger Will Robinson, Danger!")


    f.write("'%s'\n '%s'\n '%s'" %(identifier, sequence, sanger))
    f.close()

    return (sanger,sequence,identifier,plus)

Now for my question. I have ran this and no error appears, however the target file is empty. I am wondering what I am doing wrong... Is it my way to handle the lists or the lack of .join? I am sorry if this is a duplicate. It is simply that I do not know what is the mistake here. Also, important note... This is not some homework, I just need a masker for work... Any help is greatly appreciated and all mentions of improvement to the code are welcomed. Thanks.

Note (fastq format):

@SRR566546.970 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50
TTGCCTGCCTATCATTTTAGTGCCTGTGAGGTGGAGATGTGAGGATCAGT

+

hhhhhhhhhhghhghhhhhfhhhhhfffffe`ee[`X]b[d[ed`[Y[^Y

Edit: Still unable to get anything, but working at it.

هل كانت مفيدة؟

المحلول

Your problem is with your understanding of the return statement. return x means stop executing the current function and give x back to whoever called it. In your code you have:

return sanger
return sequence
return identifier
return plus

When the first one executes (return sanger) execution of the function stops and sanger is returned. The second through fourth return statements never get evaluated and neither does your I/O stuff at the end. If you're really interested in returning all of these values, move this after the file I/O and return the four of them packed up as a tuple.

f.write("'%s'\n '%s'\n '%s'" %(identifier, sequence, sanger))
f.close()
return (sanger,sequence,identifier,plus)

This should get you at least some output in the file. Whether or not that output is in the format you want, I can't really say.

Edit: Just noticed you were using /n and probably want \n so I made the change in my answer here.

نصائح أخرى

You have all sorts of errors beyond what @Brian addressed. I'm guessing that your if and else tests are trying to check the first character of line? You'd do that with

if line[0] == '@':
    etc.

You'll probably need to write more scripts soon, so I suggest you work through the Python Tutorial so you can get on top of the basics. It'll be worth your while.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top