Pregunta

I know that there is CultureInfo.TextInfo.ToUpper(), however, is there any way to retrieve a collection of all uppercase letters for a given culture?

Please note that I only want to get all the uppercase letters of the current language's alphabet. E.g. for en-US I want to get the list A,B,C,...Y,Z (order actually doesn't matter).

¿Fue útil?

Solución

There's no database built into .NET that keeps track of the letters that appear in the alphabet of a particular language. It would be a very large one. And a controversial one, even a country with a simple alphabet like Dutch has speakers that don't agree whether the Ÿ digraph is in the alphabet or not and at what position it appears. The former Yugoslavia had two alphabets, wars have been fought over it. And a changeable one, Swedish added W not long ago, forced to by the World Wide Web. And a rather unpractical one for a languages like Chinese and Korean.

You do not want to have to solve this problem in the general case.

Otros consejos

Depending on your actual definition of uppercase, there's a lot of them, just in the Invariant culture, let alone the others, and it varies depending upon your operating system.

This LinqPad query lists 973 (on Win8.1, 873 on Vista, 673 on XP) uppercase characters by my definition, which is the char is invariant to ToUpperInvariant and not invariant to ToLowerInvariant:

var UppercaseChars = from i in Enumerable.Range(0, 65536)
                    let c = (char)i
                    let u = Char.ToUpperInvariant(c)
                    let l = Char.ToLowerInvariant(c)
                    where c == u && u != l
                    select c;
UppercaseChars.Count().Dump();
String.Join(" ", UppercaseChars).Dump();

The LinqPad query

Obviously you can change this to use CultureInfo.TextInfo.ToUpper and .ToLower to obtain the list for any culture available.

Note my "definition" of uppercase misses 33 characters (on Win8.1, 135 on Vista, 306 on XP) that are called uppercase by the Unicode Category, but don't have a lowercase alternative (according to ToLowerInvariant). However, it also includes 69 characters (on Win8.1, 71 on Vista, 42 on XP) that are not defined as UppercaseLetter by the Unicode Category, but still have a lowercase alternative (again according to ToLowerInvariant). The latter are some of the characters in the Unicode Categories TitlecaseLetter (not in XP), LetterNumber and OtherSymbol. Vista actually includes 4 characters that are in the Unicode Category LowercaseLetter (ῃ ῳ ⱥ ⱦ).

To actually answer your question, and your questions in comments: the place to get upper case characters according to the Unicode Category "database" is via Char.GetUnicodeCategory. The actual database is not publicly accessible in any other useful way.

For reference you can see the first 255 entries here; the rest is loaded here and looked up here.

And remember your definition of upper case may differ from Unicode's, as I mention in my other answer.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top