Question

new to Python.

I want to remove from a file line duplicate and also certain characters.

For example I have the following file:

A   786 65534 65534 786 786 786 786 10026/AS4637 19151 19151 19151 19151 19151 19151 10796/AS13706
B   786 65534 65534 786 786 786 3257 3257 3257 1257 1257 1257 1257 1257 1257 1257 49272

The desired output I want is:

A   786 10026 4637 19151 10796 13706
B   786 3257 1257 49272

Two things going on here, first any line which has #65000 needs to be removed. Second, sometimes you get two characters divided by a '/' and that has undesired letters like #AS which I do not want.

I have the following code:

import os

p = './testing/test.txt'
fin = open(p, 'r')
uniq = set()
for line in fin.readlines():
    word = line.rstrip().split(' ')[3:]
    if not word in uniq:
        uniq.add(word)
        print word
ips.close()

I'm getting a:

TypeError: unhashable type: 'list'

As you can see I can't even check if the word is greater than 65000 as I can't even remove duplicates through set()

Please help on this.

Please I could really use some help here

Was it helpful?

Solution

This could help, as a start:

for line in fin.readlines():
    words = line.split()    # list of words
    new_words = []
    unique_words = set()
    for word in words:
        if (word not in unique_words and
                  (not word.isdigit() or int(word) <= 65000)):
            new_words.append(word)
            unique_words.add(word)
    new_line = ' '.join(new_words)
    print new_line

Turns this:

A   786 65534 65534 786 786 786 786 10026/AS4637 19151 19151 19151 19151 19151     19151 10796/AS13706

Into this:

A 786 10026/AS4637 19151 10796/AS13706

Obviously, it's not quite what you want yet, but try to do the rest yourself. :) The str.replace() method might help you getting rid of those /AS.

OTHER TIPS

The problem is:

word = line.rstrip().split(' ')[3:]

The split function is returning a list of words. List isn't hashable so you can't use in or add on it. You need to iterate through the strings in your split list, and check each word one by one.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top