Question

I have a file containing some diaeresis marks, ̈. I need to replace them with \textdiaeresis, for use in TeX.

The usual commands which seem to work with other symbols always causes the output to be \\textdiaeresis or \ extdiaeresis, the later, where \t is interpreted to mean "tab".

I have tried these sed commands:

sed -i 's/\ ̈/\textdiaeresis /g' ./file.txt
sed -i 's/\ ̈/\\textdiaeresis /g' ./file.txt
sed -i 's/\ ̈/\\\textdiaeresis /g' ./file.txt
sed -i "s/\ ̈/\textdiaeresis /g" ./file.txt
sed -i "s/\ ̈/\\textdiaeresis /g" ./file.txt
sed -i "s/\ ̈/\\\textdiaeresis /g" ./file.txt

I have tried these nawk commands:

nawk '{sub(/ ̈/,"\textdiaeresis"); print}' file.txt > file.txt2
cp file.txt2 file.txt
nawk '{sub(/ ̈/,"\\textdiaeresis"); print}' file.txt > file.txt2
cp file.txt2 file.txt
nawk '{sub(/ ̈/,"\\\textdiaeresis"); print}' file.txt > file.txt2
cp file.txt2 file.txt

How can I replace a diaeresis with this TeX code?

Was it helpful?

Solution

On Mac OS X 10.7.4, under bash (version 3.2.48), I find no problem with sed (which is the Mac OS X sed, not the GNU sed).

$ x="s, ̈. "
$ echo "$x" | ~/src/sbcs2utf8/utf8-unicode
(standard input):
0x73 = U+0073
0x2C = U+002C
0x20 = U+0020
0xCC 0x88 = U+0308
0x2E = U+002E
0x20 = U+0020
0x0A = U+000A
$ echo "$x" | sed 's/ ̈/\\textdiaresis/'
s,\textdiaresis. 
$

The character is U+0308 COMBINING DIAERESIS; I copied the fragment assigned to x from the question. The Unicode standard specifies (Chapter 2, §2.11):

In the Unicode Standard, all combining characters are to be used in sequence following the base characters to which they apply. The sequence of Unicode characters U+0061 “a” LATIN SMALL LETTER A, U+0308 “ ¨ ”combining diaeresis, U+0075 “u” LATIN SMALL LETTER U unambiguously represents “äu” and not “aü”.

Thus, the diaeresis in the question text should be rendered over the space. Using Firefox (14.0.1), in the shell output, the diaeresis is shown over the . following it, which is wrong. And in the sed command, the diaeresis appears to be combined with the following slash, which is also wrong. Oh well! But the translation via sed looks correct to me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top