Question

I use the following command on a huge text file

sed 's/\tEN-GB\t//g' "/home/ubuntu/0214/corpus/C.txt"

The file contains a [tab]EN-GB[tab] in each row, but what I get is the original text. I cannot figure out why. NOTE: when I'm using 's/\t//g' it works and the resulting string is [a lot of no-tabs]EN-GB[a lot of no-tabs] in each row, so the tabs vanished.

UPDATE: Here is the incriminated part of the output from cat -vet:

^@2^@0^@0^@7^@0^@1^@0^@4^@~^@1^@6^@3^@2^@4^@3^@^I^@^I^@0^@^I^@E^@N^@-^@G^@B^@^I^@T^@h^@e^@      ^@a^@d^@m^@i^@n^@i^@s^@t^@  

I'm out of black magic... thanks in advance

Était-ce utile?

La solution 2

You can use ANSI-C quoting to represent the TAB character:

sed 's/'$'\tEN-GB\t''//g' filename

EDIT: The output of cat -vet suggests that you have NULL characters in your input. Remove those before piping the results to the above command. Say:

tr -d '\x0' < filename | sed 's/'$'\tEN-GB\t''//g'

Autres conseils

It appears that your sed command is correct but you have some null characters in your text file

Run this sed command to remove nulls first:

sed -i.bak 's/\x0//g; s/\tEN-GB\t//g' "/home/ubuntu/0214/corpus/C.txt"
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top