Question

I have a wordlist containing numbers, English Words, and Bengali words in a column and in other column I have their frequencies. These columns have no headers. I need the words with frequencies between 5- 300. This is the code I am using. It is not working.

wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")

for word in wordlist:
    if word[1] >= 3
        print(word[0])
    elif word[1] <= 300
        print(word[0])

This is giving me a syntax error.

File "<stdin>", line 2
    if word[1] >= 3
              ^
SyntaxError: invalid syntax

Can anyone please help?

Was it helpful?

Solution 2

There are few problems with your code, I add full explanation in an hour and so. See how it should look like and consult docs in the meantime:

First, it is safer to use with open() clause for opening files (see https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects)

filepath = 'C:/Python27/bengali_wordlist_full.txt'

with open(filepath) as f:
    content = f.read().decode('string-escape').decode("utf-8") 
    # do you really need all of this decdcoding?

Now content holds text from file: this is one, long string, with '\n' characters to mark endlines. We can split it to list of lines:

lines = content.splitlines()

and parse one line at the time:

for line in lines:
    try:
        # split line into items, assign first to 'word', second to 'freq'
        word, freq = line.split('\t') # assuming you have tab as separator
        freq = float(freq) # we need to convert second item to numeric value from string
        if 5 <= freq <= 300: # you can 'chain' comparisons like this
            print word
    except ValueError: 
        # this  happens if split() gives more than two items or float() fails
        print "Could not parse this line:", line
        continue

OTHER TIPS

You should add : after your if statements to fix this SyntaxError:

wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")

for word in wordlist:
    if word[1] >= 3:
        print word[0]
    elif word[1] <= 300:
        print word[0]

Read this: https://docs.python.org/2/tutorial/controlflow.html

Also here it is one useful tip: when python gives you SyntaxError for some line, always look at the previous line, then at the following one.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top