Question

I create a Report List with Winrar.
Inside this list i have a text list like this

<tag>Adventures of Shuggy</tag>
!Shuggy.png
!Sound Bank.txt
4.lwav
5.lwav
6.lwav
88.lwav
89.lwav
<tag>Adventures of Jack</tag>
90.lwav
91.lwav
92.lwav
93.lwav
!Sound Bank.xsb

I want remove duplicates extensions inside every tag and have a text like this:

<tag>Adventures of Shuggy</tag>
!Shuggy.png
!Sound Bank.txt
4.lwav
<tag>Adventures of Jack</tag>
90.lwav
!Sound Bank.xsb

or even better

<tag>Adventures of Shuggy</tag>
.png
.txt
.lwav
 <tag>Adventures of Jack</tag>
.xsb

Is there a regular expression to use on notepad++ to remove lines with identical .txt or identical .lwav inside every tag like <title>?
Can i use excel for this?

Was it helpful?

Solution

Tossed this up real quick. Should work fine in notepad++. Make sure to set the global flag (if there is one) and the multiline one.

/^.+(\.[^.]+)$(?=\s*(?:(?!<tag>)[^.])+\1)|^(?!<tag>)[^.]+/gm

You should (most likely) not use the flags and delimiters like I have above. I'm sure you'll figure it out.

Explanation + demo : http://regex101.com/r/lC0lD1

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top