Removing textual lines from top or that include specific char or string from multiple text files

https://stackoverflow.com/questions/8796568

15-04-2021
|

Question

I have lyrics text files in the same directory, from which I would like to remove several lines according to these rules:

A. remove all line that include square brackets [ ]

B. remove the three non-empty lines from the top.

C. remove two non-empty lines from the bottom.

For an example, maybe a command line such as this be fixed to do (A) for multiple files?

type *.txt | findstr /v LYRICS | findstr /v "[" | findstr /v "]" >*.txt

Gilbert

Solution

Major Edit - I tested and debugged my code. There were many bugs.

I think this should get you close. But I'm still not sure I understand your non-empty condition.

This code will modify all .TXT files in the current directory. For each file, it should delete the 1st 3 lines, the last 2 lines, and any lines that contains [ or ]. Any line that starts with : in the 1st position will have leading : stripped.

@echo off
for %%F in (*.txt) do (
  for /f %%N in ('find /c /v "" ^<"%%F"') do set bottom1=%%N
  set /a bottom2=bottom1-1
  (
    for /f "tokens=1* delims=:" %%A in (
      'findstr /n "^" "%%F" ^| findstr /vr /c:"\[" /c:"\]" /c:"^1:" /c:"^2:" /c:"^3:" /c:"^%%bottom1%%:" /c:"^%%bottom2%%:"'
    ) do echo(%%B
  )>"%%~nF.mod"
  rem del "%%F"
  rem ren "%%~nF.mod" "%%~nxF"
)

As written, the original file will be preserved and the modified version will have the same name but with a .mod extension.

If you are comfortable with the results, you can remove the rem from the front of the last 2 lines, and then your original files will be overwritten with the modified version. Make sure you only run the script once if you make this change!

The first FIND gets the count of the number of lines, so we know which lines to remove from the bottom based on line number.

The interesting bit is the big long command in the 2nd FOR loop. The first FINDSTR prefixes each line with the line number. These results are then piped to a second FINDSTR that removes the appropriate lines based on my understanding of your requirements.

It uses regular expressions. For example, "^1:" is a regex that looks for "1:" at the beginning of a line.

The double percents are used for bottom1 and bottom2 so that we can access the current values without using delayed expansion. Within the parent batch %%bottom1%% becomes %bottom1%, which is passed on to the FINDSTR command. FOR /F IN() clause commands are executed within their own CMD session, so it is able to properly expand the current value of %bottom1%.

Since the /v option is used, FINDSTR preserves all lines that do NOT match any of the regex search strings. You were on the right track in this regard. It is just more efficient to do all the filtering with one FINDSTR.

The FOR /F options are set to break each line into two tokens, breaking at the 1st : in each line. Only the 2nd token is printed. That is how the line number prefix is stripped out of the output.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow