Find multiple subpatterns and replace using back copy on multiple lines with textwrangler grep

StackOverflow https://stackoverflow.com/questions/21311635

  •  01-10-2022
  •  | 
  •  

Question

Currently I am using TextWrangler (mac) with the grep find/replace, but would be just as happy to use any other editor or command line tools.

I have a text file with the structure like this (yes, there is a space at the beginning of each line):

 Reference 1 -  This is a sentence with a period. And this exclaims! So does this one!
 Reference 2 -  This questions? And this, this one responds. But this YELLS!

And I need to keep the reference, but break each sentence into its own line, like this:

 Reference 1 -  This is a sentence with a period.
 Reference 1 -  And this exclaims!
 Reference 1 -  So does this one!
 Reference 2 -  This questions?
 Reference 2 -  And this, this one responds.
 Reference 2 -  But this YELLS!

I can get it to keep the reference and the last sentence with this (copied/replaced the newline character in there, that is why the break at the end--otherwise it was matching the rest of the document):

^([^-]+ -\s+)(?:([^.!?]+?[.!?]))(([^.!?]+?[.!?])+?)$    

The replace is like this:

\1\2
\1\3

And the results look like this:

 Reference 1 -  This is a sentence.
 Reference 1 -   And this exclaims! So does this one!

 Reference 2 -  This questions?
 Reference 2 -   And this, this one responds. But this YELLS!

If I run this several times, it doesn't ever separate the other two sentences into new lines. But if I add another line in the replace:

\1\4

Then I get this as a result:

 Reference 1 -  This is a sentence.
 Reference 1 -   And this exclaims! So does this one!
 Reference 1 -   So does this one!

 Reference 2 -  This questions?
 Reference 2 -   And this, this one responds. But this YELLS!
 Reference 2 -   But this YELLS!

My hope is that this is pretty simple and I am just missing a switch/modifier/etc.

If I can do just one sentence at a time, I don't mind doing other cleaning runs.

Any ideas?

Was it helpful?

Solution

What about:

Search:
  ^( [^-]+-\s+)(.*[.!?]) *(.*[.!?])

Replace:
  \1\2
  \1\3

I had to run it through a few times, but it seemed to match your target pattern.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top