en_US invokes a smarter sorting algorithm, that ignores those strings of dots like they would normally be ignored in sorting. It's obviously case-sensitive, so En_US is falling back to the default language (probably C).
the difference between En_US and en_US?
题
today I found a question in sorting a file with linux sort command. When I set the env LANG=En_US, the result is what I expect. But when LANG=en_US, the result is strange. Some commands I ran and the output as follows:
[work@xx:/data1/muce_temp/datamarts/reduce_result_file/302/1d/201212260000]$ cat dd.dat
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard... 16
23 360_guard 16
23 360_guard... 17
23 360_guard... 18
[work@xx:/data1/muce_temp/datamarts/reduce_result_file/302/1d/201212260000]$ LANG=En_US sort dd.dat
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
[work@xx:/data1/muce_temp/datamarts/reduce_result_file/302/1d/201212260000]$ LANG=en_US sort dd.dat
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard... 16
23 360_guard 16 (why this line appear here ? )
23 360_guard... 17
23 360_guard... 18
the format details of rows in this file likes:
2^E3^F360_guard^E...^I16^Ee^E17/18^I63776769$
2^E3^F360_guard^E^I16^Ee^E17/18^I63776769$
2^E3^F360_guard^E...^I17^Ei^E0^I63776771$
2^E3^F360_guard^E...^I18^Ei^E1^I63776773$
^E is '\x05' , ^F is '\x06', ^I is tab, $ is '\n' .
Thanks in advance.
解决方案
其他提示
"en_US" is the "correct" value for "Language=English, locale=United States". Other locales include "en_GB" (Great Britain), "en_CA" (Canada) and en_AU (Australia):
I get these results:
echo $LANG;sort tmp.txt
en_US.UTF-8
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
export LANG=en_US;echo $LANG;sort tmp.txt
en_US
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
export LANG=En_US;echo $LANG;sort tmp.txt
En_US
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
export LANG=abc-silly;echo $LANG;sort tmp.txt
abc-silly
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
不隶属于 StackOverflow