Question

I've been puzzled with this when I saw the following files listed by ls in strange order:

Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv

From human perspective 'I' should go first, then 'II' and so on.

so I created file with the following content:

$ cat 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The

if I sort it it gives me this:

$ sort 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The

However, if I remove '-' and everything after it sorts correct:

$ cat 1
Star Wars Episode II 
Star Wars Episode III 
Star Wars Episode I 
Star Wars Episode IV 
Star Wars Episode VI 
Star Wars Episode V 

$ sort 1
Star Wars Episode I 
Star Wars Episode II 
Star Wars Episode III 
Star Wars Episode IV 
Star Wars Episode V 
Star Wars Episode VI 

So, as soon as I add any symbol after space it starts sorting unpredictable for me:

$ cat 1
Star Wars Episode II y
Star Wars Episode III x
Star Wars Episode I z
Star Wars Episode IV w
Star Wars Episode VI v
Star Wars Episode V u

$ sort 1
Star Wars Episode III x
Star Wars Episode II y
Star Wars Episode IV w
Star Wars Episode I z
Star Wars Episode VI v
Star Wars Episode V u

Any hint on this sort behaviour ?

Update: sort: using ‘en_CA.UTF-8’ sorting rules

update #2 as per comment below it is because of locale.

ls | LANG=C sort
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv

Why then UTF8 locale makes it different ? I checked with ru_RU.UTF8 (incorrect sorting) and ru_RU.KOI8-R (proper sorting)

Update #3 It is about locale: http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

Was it helpful?

Solution

OTHER TIPS

It ignores all non-alphanumeric characters when using a locale based sort:

II - Attack   -> "IIA"
III - Revenge -> "III"
I - The       -> "ITh"
IV - A        -> "IVA"
VI - Return   -> "VIR"
V - The       -> "VTh"

With LC_ALL=C, the space character is sorted in front of alphanumerics:

I - The       -> "I -"
II - Attack   -> "II "
III - Revenge -> "III"
IV - A        -> "IV "
V - The       -> "V -"
VI - Return   -> "VI "

So it is coincidence that this works, but it takes 30 more movies for it to actually fail.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top