Вопрос

I have a GTF file with RNA sequencing data, which looks like this when opened in Shell:

1   Cufflinks   exon    12320750    12320851    .   +   .   gene_id "XLOC_000102"; transcript_id "TCONS_00014924"; exon_number "5"; gene_name "VPS13D"; oId "CUFF.308.3"; nearest_ref "ENST00000358136"; class_code "j"; tss_id "TSS819"; type "pc";
1   Cufflinks   exon    12321005    12321206    .   +   .   gene_id "XLOC_000102"; transcript_id "TCONS_00014924"; exon_number "6"; gene_name "VPS13D"; oId "CUFF.308.3"; nearest_ref "ENST00000358136"; class_code "j"; tss_id "TSS819"; type "pc";
1   Cufflinks   exon    12321958    12322137    .   +   .   gene_id "XLOC_000102"; transcript_id "TCONS_00014924"; exon_number "7"; gene_name "VPS13D"; oId "CUFF.308.3"; nearest_ref "ENST00000358136"; class_code "j"; tss_id "TSS819"; type "pc";

I need to write a code, when user inputs a gene_id, it returns all the lines containing this gene_id.

I wrote this code:

def transcript_search(search_parameter):
for line in file:
   if search_parameter in line:
        return line
   else:
       print('Invalid entry')


f = open('/Users/labadmin/Desktop/example.gtf', 'r')
file = f.read()
gene_id = input("Enter the gene_id:")
transcript_search(gene_id)

When I run this code, even if I input id that is present in the list, it doesn't find it.

I've also tried to split this file into list using f.split, but it gave me an error:

Traceback (most recent call last):
File "<pyshell#25>", line 1, in <module>
f.split()

I'm new to Python and would appreciate your help.

Это было полезно?

Решение

def transcript_search(search_parameter,files):
   for line in files:
       if search_parameter in line:
          return line
      else:
        print('Invalid entry')


files = open('/Users/labadmin/Desktop/example.gtf', 'r')

gene_id = input("Enter the gene_id:")
transcript_search(gene_id,files)

Don't use file. As that is a python keyword. You also need to pass in the file to the function.

Also are you sure when it's invalid you want to print but not return anything? The return type would be None. Could be what you want so I didn't change it.

To print invalid only once:

def transcript_search(search_parameter,files):
   for line in files:
       if search_parameter in line:
          return line
   #In this way invalid will only print after it has gone through all the lines and never returned.
   print('Invalid entry')

As for saving:

saved_lines = []

files = open('/Users/labadmin/Desktop/example.gtf', 'r')
gene_id = input("Enter the gene_id:")
#Append to list the saved entries.
saved_lines.append(transcript_search(gene_id,files))

After which you write all of the list to a line using files.writelines(list) or print them to screen or whatever you want.

This adds all the lines with ur search_parameter to a list and returns the list

def transcript_search(search_parameter,files):
   toreturn = []
   for line in files:
       if search_parameter in line:
          toreturn.append(line)
   if len(toreturn)>0:
       #Notice at how this returns an array of strings instead of one string.
       return toreturn
   print('Invalid entry')
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top