Question

My attempts on RHEL 6.3:

$ export LC_ALL=fr_FR.utf-8
$ sed 's/ \([a-zA-Zé]\)\([^ ]*\) /[\u\1\L\2\E] /g' <<< " hélène  NOËL  étienne "
 hélène  NOËL  étienne

$ export LC_ALL=C
$ sed 's/ \([a-zA-Zé]\)\([^ ]*\) /[\u\1\L\2\E] /g' <<< " hélène  NOËL  étienne "
[Hÿlÿne] [Noÿl] [ÿtienne]

$ sed --version
GNU sed version 4.2.1
[...]

Is able to output the following?

[Hélène] [Noël] [Étienne]
Was it helpful?

Solution

is this ok for you?

kent$  echo " hélène  NOËL  étienne "|sed -r 's/(\S)(\S+)/[\U\1\L\2]/g'
 [Hélène]  [Noël]  [Étienne] 

my sed version is abit different from yours, but I think the line should run there too:

kent$  sed --version |head -1
sed (GNU sed) 4.2.2

added my locale settings, you may want to know:

kent$  echo $LANG
en_US.utf8

kent$  locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

OTHER TIPS

Kent's answer did not solve my issue but I have not provided him all my constraints. My input file is like:

sfou;STéphane Foù - stephane.fou@example.com;;
fbar;frédéric bâr - frederic.bar@example.com;;
hnoel;Hélène NOËL - helene.noel@example.com;;

The script shall capitalize the names only:

sfou;Stéphane Foù - stephane.fou@example.com;;
8945;Frédéric Bâr - frederic.bar@example.com;;
hnoel;Hélène Noêl - helene.noel@example.com;;

Based on Kent's help, I successfully passed this script:

LC_ALL=fr_FR sed -r 's/(\w)(\w*) /\U\1\L\2 /g' test.cvs

Other locales do not give the right result:

$ LANG=fr_FR.utf8 LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;STé[Phane] Foù - stephane.fou@example.com;;
fbar;frédé[Ric] bâ[R] - frederic.bar@example.com;;
hnoel;Hélè[Ne] NOË[L] - helene.noel@example.com;;

$ LANG=C LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;STé[Phane] Foù - stephane.fou@example.com;;
fbar;frédé[Ric] bâ[R] - frederic.bar@example.com;;
hnoel;Hélè[Ne] NOË[L] - helene.noel@example.com;;

$ LANG=en_US.utf8 LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;STé[Phane] Foù - stephane.fou@example.com;;
fbar;frédé[Ric] bâ[R] - frederic.bar@example.com;;
hnoel;Hélè[Ne] NOË[L] - helene.noel@example.com;;

Locales en_USand fr_FR (without .utf8) are OK:

$ LANG=en_US LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;[Stéphane] [Foù] - stephane.fou@example.com;;
fbar;[Frédéric] [Bâr] - frederic.bar@example.com;;
hnoel;[Hélène] [Noël] - helene.noel@example.com;;

$ LANG=fr_FR LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;[Stéphane] [Foù] - stephane.fou@example.com;;
fbar;[Frédéric] [Bâr] - frederic.bar@example.com;;
hnoel;[Hélène] [Noël] - helene.noel@example.com;;

Note: I have discovered \w from CodeGnome's links.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top