Split it up into several sed
statements, separated by ;
:
sed 's/æ/ae/g;s/ø/oe/g;s/å/aa/g;s/Æ/Ae/g;s/Ø/Oe/g;s/Å/Aa/g'
Question
I want to sanitise some input and replace several characters with acceptable input, e.g. a Danish 'å
' with 'aa
'.
This is easily done using several statements, e.g. /æ/ae/
, /å/aa/
, /ø/oe/
, but due to tool limitations, I want to be able to do this in a single regular expression.
I can catch all of the relevant cases (/[(æ)(ø)(å)(Æ)(Ø)(Å)]/
) but I replacement does not work as I want it to (but probably completely as intended):
$ temp="RødgrØd med flæsk"
$ echo $temp
RødgrØd med flæsk
$ echo $temp | sed 's/[(æ)(ø)(å)(Æ)(Ø)(Å)]/(ae)(oe)(aa)(Ae)(Oe)(Aa)/g'
R(ae)(oe)(aa)(Ae)(Oe)(Aa)dgr(ae)(oe)(aa)(Ae)(Oe)(Aa)d med fl(ae)(oe)(aa)(Ae)(Oe)(Aa)sk
(first echo line is to show that it isn't an encoding issue)
Just as an aside, the tool issue is that I should like to also use the same regex in a Sublime Text 2 snippet.
Anyone able to discern what is wrong with my regex statement?
Thanks in advance.
Solution
Split it up into several sed
statements, separated by ;
:
sed 's/æ/ae/g;s/ø/oe/g;s/å/aa/g;s/Æ/Ae/g;s/Ø/Oe/g;s/Å/Aa/g'
OTHER TIPS
With
sed -e 's/Find/Replace/g;s/Find/Replace/g;[....];/Find/Replace/g'
you'll do the trick.
So, translate into what you need
sed -e 's/æ/ae/g;s/ø/oe/g;s/å/aa/g;s/Æ/Ae/g;s/Ø/Oe/g;s/Å/Aa/g'
This might work for you (GNU sed):
sed -r 's/$/\næaeøoeåaaÆAeØOeÅAa/;:a;s/([æøåÆØÅ])(.*\n.*\1(..))/\3\2/;ta;s/\n.*//' file
It works by adding a lookup table to the end of the line, looping until all keys are replaced then removes the lookup table.