Could sed or awk use NUL character as record separator?

https://stackoverflow.com/questions/9170372

26-04-2021
|

Question

I have a NUL delimited output coming from the following command :

some commands | grep -i -c -w -Z 'some regex'

The output consists of records of the format :

[file name]\0[pattern count]\0

I want to use text manipulation tools, such as sed/awk, to change the records to the following format :

[file name]:[pattern count]\0

But it seems that sed/awk usually handles only records delimited by the "newline" character. I would like to know that how sed/awk could be used to achieve my purpose, or if sed/awk could not handle such case what other Linux tool should I use.

Thanks for any suggestion.

Lawrence

Solution

By default, the record separator is the newline character, defining a record to be a single line of text. You can use a different character by changing the built-in variable RS. The value of RS is a string that says how to separate records; the default value is "\n", the string containing just a newline character.

 awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list

OTHER TIPS

Since version 4.2.2, GNU sed has the -z or --null-data option to do exactly this. Eg:

sed -z 's/old/new' null_separated_infile

Yes, gawk can do this, set the record separator to \0. For example the command

gawk 'BEGIN { RS="\0"; FS="=" } $1=="LD_PRELOAD" { print $2 }' </proc/$(pidof mysqld)/environ

Will print out the value of the LD_PRELOAD variable:

/usr/lib/x86_64-linux-gnu/libjemalloc.so.1

The /proc/$PID/environ file is a NUL separated list of environment variables. I'm using it as an example, as it's easy to try on a linux system.

The BEGIN part sets the record separator to \0 and the field separator to = because I also want to extract the part after = based on the part before =.

The $1=="LD_PRELOAD" runs the block if the first field has the key I'm interested in.

The print $2 block prints out the string after =.

But mawk cannot parse input files separated with NUL. This is documented in man mawk:

BUGS
       mawk cannot handle ascii NUL \0 in the source or data files.

mawk will stop reading the input after the first \0 character.

You can also use xargs to handle NUL separated input, a bit non-intuitively, like this:

xargs -0 -n1 </proc/$$/environ

xargs is using echo as the default comand. -0 sets the input to be NUL separated. -n1 sets the max arguments to echo to be 1, this way the output will be separated by newlines.

And as Graeme's answer shows, sed can do this too.

Using `sed` for removing the `null` characters -

sed 's/\x0/ /g' infile > outfile

or make in-file substitution by doing (this will make backup of your original file and overwrite your original file with substitutions).

sed -i.bak 's/\x0/ /g' infile

Using `tr`:

tr -d "\000" < infile > outfile

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow

Could sed or awk use NUL character as record separator?

Using sed for removing the null characters -

Using tr:

Using `sed` for removing the `null` characters -

Using `tr`: