Regex to find double whitespace not on the start of a line
-
25-06-2021 - |
Question
I'm struggling with old "formatted" code, where a lot of whitespace is added to line up ='s and so on:
if (!This.RevData.Last().Size() eq 0) then
!DocRev = '?'
!Status = '?'
!RevCode = '?'
else
!Pad = !This.RevData.Last()[1]
!DocRev = !Pad[2]
!Status = !This.GenUtil.If((!Pad[3] eq 'UA'), '' , !Pad[3])
!RevCode = !This.GenUtil.If((!Pad[6] eq '' ), '?', !Pad[6])
endif
In this example it actually makes some sense, but most often the code have been modified to a state that makes the whitespace much more confusing than helpful.
What I'd like to do is to replace all double (or more) whitespaces with a single space, but I'd of course like to keep the indenting. Hence I'm looking for a regex to identify double (or more) spaces that are not at the start of the line. I've tried negative lookbehind, but can't seem to make it work:
(?<![\n])[\s]{2,}
Any hints, anyone? Ohh, and I'm using Ultraedit (the Perl regex engine), so the regex should be "UE-compatible".
EDIT: UE doesn't evaluate regex's line for line; newlines is just a character in the long string that is the document, which complicates the problem a bit.
Solution
Replace "([^ \n]) +
" with "$1
". No funky lookbehinds required.
(For emphasis, the markup doesn't show it clearly, but there are two spaces before the plus sign, to avoid needlessly replacing single spaces with single spaces.)
OTHER TIPS
It's not PCRE but this will do what you need if you have access to a Linux shell:
sed s/"([^ ]) +"/"\1 "/g source.code > reformatted.code
It will just replace any spaces that follow a non-space character while preserving that character. Should be easy enough to perlify it, if you're used to Perl Regexs.
Try replacing...
(?<=[^\r\n])([\t ])[\t ]*
with...
$1