seen = set()
res = []
for line in textlines:
if line not in seen:
res.append(line)
if not line.strip():
seen.add(line)
textlines = res
Remove multiple occurrence except for particular values?
-
30-08-2022 - |
Question
I have a list of lines of text: textlines
which is a list of strings (ending with '\n'
).
I would like to remove multiple occurence of lines, excluding those that contains only spaces, line feeds and tabs.
In other words, if the original list is:
textlines[0] = "First line\n"
textlines[1] = "Second line \n"
textlines[2] = " \n"
textlines[3] = "First line\n"
textlines[4] = " \n"
The output list would be:
textlines[0] = "First line\n"
textlines[1] = "Second line \n"
textlines[2] = " \n"
textlines[3] = " \n"
How to do that ?
Solution
OTHER TIPS
Because I can't resist a good code golfing:
seen = set()
[x for x in textlines if (x not in seen or not x.strip()) and not seen.add(x)]
Out[29]: ['First line\n', 'Second line \n', ' \n', ' \n']
This is equivalent to @hughbothwell's answer. Which you should use if you ever intend to have human beings read your code :-)
new = []
for line in textlines:
if line in new and line.strip():
continue
new.append(line)
textlines = new
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow