Question

I have write down a code to enumerate char "a" from a text file (simple text document copied from pdf):

input_f = open('/home/zebrafish/Desktop/stackq/doc.txt','r')

#text i used in "doc.txt file"
#
#unctional similarities between the ATP binding pockets of
#kinases or between chemotypes of inhibitors that cannot
#be predicted from the sequence of the kinase or the
#chemical structure of the inhibitor.
#We first compared PI3-K family members according to

output_f = open('/home/zebrafish/Desktop/stackq/svm_in.txt','w')


for line in input_f :
    a = line
    print "\n",
    for y in enumerate([x[0] for x in enumerate(line) if x[1]=='a']): 
        a = ("%d:%d" % (y[0]+1,y[1]+1))
        #print a,
        output_f.write(a+" ")        

input_f.close()
output_f.close()

output of this code look like this if i run this script without generating the output file as per my requirement, for each line it calculate the position of "a" with frequency, as in first line "a" appeared in two times first at 8th position and second at 16th position and hence enumerated as "1:8 2:16" and so one for each and every line:

1:8 2:16 
1:4 2:47 3:51 
1:42 
1:7 
1:14 2:26 3:40 

but when i write down output in a text file "svm_in.txt" with "output_f.write()" output is very wired . some thing like this:

1:8 2:16 1:4 2:47 3:51 1:42 1:7 1:14 2:26 3:40 

how can i get result in a output file for each line with "+" sine at the beginning of line like this:

+ 1:8 2:16 
+ 1:4 2:47 3:51 
+ 1:42 
+ 1:7 
+ 1:14 2:26 3:40 
Was it helpful?

Solution 2

I would do it like this:

for line in input_f:

    # find the positions of As in the line
    positions = [n for n, letter in enumerate(line, 1) if letter == 'a']

    # Create list of strings of the form "x:y"
    pairs = [("%d:%d" % (i, n)) for i, n in enumerate(positions, 1)]

    # Join all those strings into a single space-separated string
    all_pairs = ' '.join(pairs)

    # Write the string to the file, with a + sign at the beginning
    # and a newline at the end
    output_f.write("+ %s\n" % all_pairs)

You can modify the string in the last line to control how the line will be written in the output file.

OTHER TIPS

Don't print your newlines, write them to the file instead:

for line in input_f :
    output_f.write("\n+ ")
    for y in enumerate([x[0] for x in enumerate(line) if x[1]=='a']): 
        a = ("%d:%d" % (y[0]+1,y[1]+1))
        output_f.write(a + " ")        

You could use some tuple unpacking to make it a little clearer what you are enumerating, and you can drop the [..] list comprehension and use a generator expression instead (saves some memory and processing):

for i, pos in enumerate((pos for pos, char in enumerate(line, 1) if char == 'a'), 1):
    output_f.write('%d:%d ' % (i, pos))

I also gave the enumerate() function a second argument, the start value, so you don't have to + 1 each number, and added the space in the file output in the string formatting.

You would normally write the newline after writing a line; and if you wanted a counter per line, add another enumerate::

for count, line in enumerate(input_f, 1):
    output_f.write("%d+ " % count)
    for i, pos in enumerate((pos for pos, char in enumerate(line, 1) if char == 'a'), 1):
        output_f.write('%d:%d ' % (i, pos))
    output_f.write('\n')

or, by using str.join() you can create a whole line in one go, using formatting to include the prefix and newline in one formatting operation:

for count, line in enumerate(input_f, 1):
    positions = (pos for pos, char in enumerate(line, 1) if char == 'a')
    line = ' '.join(['%d:%d' % (i, pos) for i, pos in enumerate(positions, 1)])
    output_f.write("%d+ %s\n" % (count, line))

which neatly avoids a trailing space as well.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top