Question

I have a list of lines of text: textlines which is a list of strings (ending with '\n').

I would like to remove multiple occurence of lines, excluding those that contains only spaces, line feeds and tabs.

In other words, if the original list is:

textlines[0] = "First line\n"
textlines[1] = "Second line \n"
textlines[2] = "   \n"
textlines[3] = "First line\n"
textlines[4] = "   \n"

The output list would be:

textlines[0] = "First line\n"
textlines[1] = "Second line \n"
textlines[2] = "   \n"
textlines[3] = "   \n"

How to do that ?

Was it helpful?

Solution

seen = set()
res = []
for line in textlines:
    if line not in seen:
        res.append(line)
        if not line.strip():
            seen.add(line)
textlines = res

OTHER TIPS

Because I can't resist a good code golfing:

seen = set()

[x for x in textlines if (x not in seen or not x.strip()) and not seen.add(x)]
Out[29]: ['First line\n', 'Second line \n', '   \n', '   \n']

This is equivalent to @hughbothwell's answer. Which you should use if you ever intend to have human beings read your code :-)

new = []
for line in textlines:
    if line in new and line.strip():
        continue
    new.append(line)
textlines = new
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top