comparing strings: file and lists

https://stackoverflow.com/questions/23589355

19-07-2023
|

Question

I'm doing a application that will the user enter a string, then all possible permutations and delete repeated.

The words of the permutations obtained should be compared line by line until a line equal to the permutation, and repeat the process with the remaining permutations.

The file contains this information: manila ana maria marta

or file: espanol.dic

Here attached a bit of code:

# coding=utf8
from __future__ import print_function
import os, re, itertools

new_dic_file = "espanol.dic"

def uniq(lst):
    # remove repeated
    key = dict.fromkeys(lst).keys()
    lst = list(key)
    return lst

def match(chars, num_chars):
    # Get the permutations of input string
    combs = itertools.permutations(chars, num_chars)
    result = []
    for combo in combs:
        result.append("".join(combo))

    # Iterate to Spanish dictionary and compare combinations of input string
    dic = open(new_dic_file)
    aux = dic.readlines()
    del dic
    aux = uniq(aux)

    for word in result:
        for word_dic in aux:
            print()
            print(word, word_dic, end="")
            print(type(word), type(word_dic), end="")
            if word == word_dic:
                print(word)
                print("########## Found! ##########")

I was printing the kind of "word" and "word_dic", and type 2 are str therefore should work, which does not ... I'm testing with this: match("aan", 3)

and the result is this:

<type 'str'> <type 'str'>
ana marta
<type 'str'> <type 'str'>
ana ana
<type 'str'> <type 'str'>
ana manila
<type 'str'> <type 'str'>
naa maria

On what should be:

ana

#### Found!!

Any questions about what I do, please tell me ...

This is the complete code. test.py

Thank you in advance.

Solution

The readlines method leaves the LF characters on the strings. So the strings read from the file have an extra character in them. That's visible in the output; notice that the type lines fall below the strings, even though there is end="" on the print statements. The string "ana" with a newline is never equal to "ana".

To fix it, remove the readlines() statement and replace it with this:

aux = dic.read().splitlines()

See here for more on readlines: Best method for reading newline delimited files in Python and discarding the newlines?

Or you could leave the readlines() there but add this:

aux = [s.rstrip() for s in aux]

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow