Question

I have a list of UTF-8 strings that I want to sort using Enumerable.OrderBy. The strings may contain any number of character sets - e.g., English, German, and Japanese, or a mix of them, even.

For example, here is a sample input list:

["東京","North 東京", "München", "New York", "Chicago", "大阪市"]

I am confused as to whether using StringComparer.CurrentCulture is the right string comparison parameter to pass to OrderBy(). What if the current culture of the application is en-US but I still want to sort UTF-8 data "correctly" beyond just en-US sorting rules?

My confusion probably stems from my understanding of the NLSSORT function in Oracle that doesn't quite match up with .NET string comparison and sorting semantics. For example, setting NLS_SORT=Japanese_M means it would sort Latin, Western European, and Japanese correctly, regardless of whether any or all of the characters occur in a given string in the sortable column.

Was it helpful?

Solution

There is no one comparison which works for all cultures.

Short of detecting the language and choosing accordingly, InvariantCulture is your best bet. As the document you link notes:

DON'T: Use StringComparison.InvariantCulture-based string operations in most cases; one of the few exceptions would be persisting linguistically meaningful but culturally-agnostic data.

I added the emphasis. That exception is more or less what you're doing.

OTHER TIPS

Keep your eyes on the ball: you sort to help humans find back a string in a list. You'll need a skilled linguist to know the sorting rules for English, German and Japanese at the same time. What are the odds of one laying eyes on your list? Always make sure the list is sorted according to the local culture rules and that sorting is localized.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top