Question

I'm doing a migration from mysql to postgres. As part of that I'm processing the mysql dump using sed before loading that into postgres.

My MySQL dump has some \0 characters and postgres doesn't like them. So I'm replacing them using a space.

sed 's/\\0/ /g' $dumpfile

Noticed an issue when the line has 320.48k\\02. Easy Listening.

$ echo '320.48k\\02. Easy Listening' | sed 's/\\0/ /g'
320.48k\ 2. Easy Listening

Thats not what I quite wanted. \\ characters are followed by 0 is not a null character. and I want to keep as it is.

Any sed experts around to help?

Was it helpful?

Solution 3

First, you can make the regex only match \0 when it follows something other than \

$ echo '320.48k\\02. Easy Listening' | sed 's/\([^\\]\)\\0/\1 /g'
320.48k\\02. Easy Listening

That fixes the problem, but it fails when \0 is at the start of the line, so make the preceding match optional:

$ echo '\0320.48k\\02. Easy\0Listening' | sed 's/\([^\\]\)\?\\0/\1 /g'
 320.48k\ 2. Easy Listening

This doesn't work though, because \\0 can match the regex with zero occurences of the parenthesised sub-group.

Another alternative is to say the \0 must either come at the start of the line, or the preceding character must not be \

$ echo '\0320.48k\\02. Easy\0Listening' | sed 's/\([^\\]\|^\)\\0/\1 /g'
 320.48k\\02. Easy Listening

(As a comment points out, this still gives the wrong result for odd numbers of backslashes.)

OTHER TIPS

If you want to replace null characters (\0), you can use:

sed 's/\x0/ /g'

or

tr '\0' ' '

I use a lot

tr '\0' '\n'< /proc/13217/environ 

to display environment of a process

Keep in mind that \\\0 would have to be replaced by \\␣ and so on. So replace any sequence containing an odd number of backslashes followed by a 0 by those same backslashes except the last one followed by a space. The sequence needs to be preceded by a non-backslash character or the beginning of the line, otherwise \\0 will match starting at the second backslash. If there are multiple consecutive \0 sequences, they won't be caught because the first matched character is the character before the first backslash; you'll need to match them all and replace them by a single space.

sed -e 's/\(\([^\]\|^\)\(\\\\\)*\)\\0\(\(\\\\\)*\\0\)*/\1 /g'

If your sed doesn't have \|, use two separate substitution commands.

sed -e 's/^\(\(\\\\\)*\)\\0\(\(\\\\\)*\\0\)*/\1 /' -e 's/\([^\]\(\\\\\)*\)\\0\(\(\\\\\)*\\0\)*/\1 /g'

Alternatively, use Perl. Its look-behind assertion comes in handy to say “this must not follow a backslash”.

perl -pe 's/(?<!\\)((?:\\\\)*)\\0/$1 /g'

In Perl, another approach is perhaps clearer: replace every backslash+character sequence, and compute the replacement text based on the following character.

perl -pe 's/\\(.)/$1 eq "0" ? " " : "\\$1"/eg'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top