Question

sorry for the simple question, but I have gone blind for four days studying and trying, and can't seem to strike the right syntax.

Using sed on cygwin, I am trying to replace one single unprintable ASCII character with another single unprintable character.

Here is my source file, using UPPERCASE text [within square brackets] to denote the unprintable ascii character:

myfile.txt:

line one[LF]
line two[LF]
line three[LF]
[SUBSTITUTE][LF]
line four{LF]
line five[LF]
line six[LF]
.
.
.

I would like to replace the LFs with TABs.

Since LFs are hex 0A and tabs are hex 09 I have tried, basically, this:

sed -i 's/\x0A/\x09/g' myfile.txt

which changes nothing in the file.

Of course, I have tried different switches like -b, -e and -r, with brackets and without, with and without the /g, extra backslashes and no backslashes, octal and decimal notation, all the way to Elven runes, with absolutely no success.

I read some answers that used 'echo' instead of a file as the source, they just confused me and didn't work.

Other examples used 'cheats' like the actual word TAB, but they prevented me from learning the syntax using numerics, so I can apply it to other unprintable chars, not just TABs.

When I try the 'file' command, I get:

file myfile.txt
file.txt: data

So, of course I tried:

sed -i -t UTF-8 's/\x0A/\x09/g' myfile.txt

but my sed didn't support that -t option.

When I try this:

oc -c myfile.txt

the [LF] character I'm searching for shows up as :

\n

I have also tried \0D as my search term, no luck either.

If anyone wants to lend me a clue by showing the correct syntax I would be very grateful.

Thanks.

Était-ce utile?

La solution

Thanks everyone, I'm grateful for people trying to help. If StackOverflow lets me, I will upvote each attempt to help.

I'm answering my own question in hopes it helps someone else.

I learned it's not quite true that sed cannot handle LFs. It can handle them, but only when it's writing them. Not when reading them.

So, I couldn't completely do the job with sed, as I hoped. I like sed's in-place switch, which seems less messy than creating another file and thus appeals to my OCD.

The format of my file was :

Mary(LF)
Smith(LF)
(SUB)(LF)
John(LF)
Public(LF)
(SUB)(LF)

and I wanted a result of:

Mary(TAB)Smith(LF)
John(TAB)Public(LF)

So, I wanted to change LF to TAB, and LF-SUB-LF to LF.

I solved my problem by first using TR to change all LFs to TABs. Couldn't use sed for this.

# change LFs to TABs ... so grep can later treat entire file as one line
tr '\012' '\011' < comengo.extract.txt > comengo.extract.out
mv comengo.extract.out comengo.extract.txt

That way, sed can now treat the entire file as one line. sed only likes to treat files line-by-line, so I made the whole fine one single line.

The second step was to let sed jump in, and make the changes I wanted. The gist of my question was "how do I represent non-printing ascii characters?".

My previous attempts were failing because I was trying to use \x12 in the sed search string. Now that the LFs were replaced, I used an uninterrupted sequence of hex numbers.

# changes (tab)(sub)(tab) to just (sub)
sed -i 's/\x09\x1A\x09/\x1A/g'   comengo.extract.tx

Then I restored LFs to the file by using sed, which can write LFs

# (sub) to (tab)(lf)
sed -i 's/\x1A/\x0A\x09/g'  comengo.extract.txt

And that worked like a charm.

Autres conseils

What about using tr?

tr '\012' '\011' < myfile.txt > tmp.out
mv tmp.out myfile.txt

The tr command is a pure filter; it does not (in the standard versions, at any rate) take any file name arguments or support overwriting or ...

The portable way to specify a linefeed in sed is with an escaped return:

sed -i 's/\
/<tab>/g'

Replace the text <tab> with a literal tab character.

If you are using bash or ksh I'd suggest you use the shell's $'...' syntax which support C style backslash escapes. For example:

[BASH] # echo $'hello\nworld'
hello
world
[BASH] # echo $'hello\x0aworld'
hello
world
[BASH] #

In fact sed can match LF characters, if you use --null-data on top of --binary:

$ echo -e "Line1\r\nLine2\rLine3\nLine4\n\rLine5" | sed --null-data --binary -r -e "s/\x0d\x0a/\x0a/g" | od --format=x1a 0000000 4c 69 6e 65 31 0a 4c 69 6e 65 32 0d 4c 69 6e 65 L i n e 1 nl L i n e 2 cr L i n e 0000020 33 0a 4c 69 6e 65 34 0a 0d 4c 69 6e 65 35 0a 3 nl L i n e 4 nl cr L i n e 5 nl

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top