today I found a question in sorting a file with linux sort command. When I set the env LANG=En_US, the result is what I expect. But when LANG=en_US, the result is strange. Some commands I ran and the output as follows:

[work@xx:/data1/muce_temp/datamarts/reduce_result_file/302/1d/201212260000]$ cat dd.dat                 
23 340_guard    16                                                                                                        
23 340_guard    17                                                                                                        
23 340_guard    18                                                                                                        
23 360_guard... 16                                                                                                      
23 360_guard    16                                                                                                        
23 360_guard... 17                                                                                                      
23 360_guard... 18              

[work@xx:/data1/muce_temp/datamarts/reduce_result_file/302/1d/201212260000]$ LANG=En_US sort dd.dat     
23 340_guard    16                                                                                                        
23 340_guard    17                                                                                                        
23 340_guard    18                                                                                                        
23 360_guard    16                                                                                                        
23 360_guard... 16                                                                                                      
23 360_guard... 17                                                                                                      
23 360_guard... 18                                 

[work@xx:/data1/muce_temp/datamarts/reduce_result_file/302/1d/201212260000]$ LANG=en_US sort dd.dat     
23 340_guard    16                                                                                                        
23 340_guard    17                                                                                                        
23 340_guard    18                                                                                                        
23 360_guard... 16                                                                                                      
23 360_guard    16          (why this line appear here ? )                                                                                      
23 360_guard... 17                                                                                                      
23 360_guard... 18      

the format details of rows in this file likes:

2^E3^F360_guard^E...^I16^Ee^E17/18^I63776769$
2^E3^F360_guard^E^I16^Ee^E17/18^I63776769$
2^E3^F360_guard^E...^I17^Ei^E0^I63776771$
2^E3^F360_guard^E...^I18^Ei^E1^I63776773$

^E is '\x05' , ^F is '\x06', ^I is tab, $ is '\n' .

Thanks in advance.

有帮助吗?

解决方案

en_US invokes a smarter sorting algorithm, that ignores those strings of dots like they would normally be ignored in sorting. It's obviously case-sensitive, so En_US is falling back to the default language (probably C).

其他提示

"en_US" is the "correct" value for "Language=English, locale=United States". Other locales include "en_GB" (Great Britain), "en_CA" (Canada) and en_AU (Australia):

I get these results:

echo $LANG;sort tmp.txt
en_US.UTF-8
23 340_guard    16
23 340_guard    17
23 340_guard    18
23 360_guard    16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18

export LANG=en_US;echo $LANG;sort tmp.txt
en_US
23 340_guard    16
23 340_guard    17
23 340_guard    18
23 360_guard    16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18

export LANG=En_US;echo $LANG;sort tmp.txt
En_US
23 340_guard    16
23 340_guard    17
23 340_guard    18
23 360_guard    16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18

 export LANG=abc-silly;echo $LANG;sort tmp.txt
abc-silly
23 340_guard    16
23 340_guard    17
23 340_guard    18
23 360_guard    16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top