Should you use international identifiers in Java/C#?

https://stackoverflow.com/questions/61615

09-06-2019
|

Question

C# and Java allow almost any character in class names, method names, local variables, etc.. Is it bad practice to use non-ASCII characters, testing the boundaries of poor editors and analysis tools and making it difficult for some people to read, or is American arrogance the only argument against?

Solution

I would stick to english, simply because you usually never know who is working on that code, and because some third-party tools used in the build/testing/bugtracking progress may have problems. Typing äöüß on a Non-German Keyboard is simply a PITA, and I simply believe that anyone involved in software development should speak english, but maybe that's just my arrogance as a non-native-english speaker.

What you call "American arrogance" is not whether or not your program uses international variable names, it's when your program thinks "Währung" and "Wahrung" are the same words.

OTHER TIPS

I'd say it entirely depends on who's working on the codebase.

If you have a small group of developers who all share a common language and you don't ever plan needing anyone who doesn't speak the language to work on the code then go ahead and use whatever characters you want.

If you need to have people of varying cultures and languages working on the code then it's probably best to stick with English since it's the common denominator for just about everyone in the world.

If your business are non-English speakers, and you think Domain Driven Design has something to it, then there is another aspect: How do we, as developers, use the same domain language as our business without any translation overhead?

That does not only mean translations between languages, say English and Norwegian, but also between different words. We should use the exact same words as our business for our entity classes and services.

I have found it easier to just give in and use my native language. Now that my code use the same words, it's easier to have a conversation with my domain experts. And after a while you get used to it, just like how you got used to code without Hungarian notation.

I used to work in a development team that happily wiped their asses with any naming (and for that matter any other coding) conventions. Believe it or not, having to cope with ä's and ö's in the code was a contributing factor of me resigning. Though I'm Finnish, I prefer writing code with US keyboard settings because curly and square brackets are a pain to write in a Finnish keyboard (try right alt and 7 and 0 for curlies).

So I say stick with the ascii characters.

Here's an example of where I've used non-ASCII identifiers, because I found it more readable than replacing the greek letters with their English names. Even though I don't have θ or φ on my keyboard (I relied on copy-and-paste.)

However these are all local variables. I would keep non-ASCII identifiers out of public interfaces.

It depends:

Does your team conform to any existing standards that require your using ASCII?
Is your code ever going to be feasibly reused or read by someone who doesn't speak your native language?
Do you envision a scenario where you'll need to ask for help online and will therefore not be able to copy-paste your code sample in as-is?
Are you certain your entire suite of tools support code encoding?

If you answered 'yes' to any of the above, stay ASCII only. If not, go forward at your own risk.

IF you get past the other prerequisites you then have one extra (IMHO more important) one - How difficult is the symbol to type.

On my regular en-us keyboard, the only way I know of to type the letter ç is to hold alt, and hit 0227 on the numeric keypad, or copy and paste.

This would be a HUGE big roadblock in the way of typing quickly. You don't want to slow your coding down with trivial stuff like this if you aren't forced to. International keyboards may alleviate this, but then what happens if you have to code on your laptop which doesn't have an international keyboard, etc?

Part of the problem is that the Java/C# language and its libraries are based on English words like if and toString(). I personally would not like to switch between non-English language and English while reading code.

However, if your database, UI, business logics (including metaphors) are already in some non-English language, there's no need to translate every method names and variables into English.

I would stick to ASCII characters because if anyone in your development team is using an SDK that only supports ASCII or you wanted to make your code open source, alot of problems could arise. Personally, I would not do it even if you are not planning on bringing anyone who doesn't speak the language in on the project, because you are running a business and it seems to me that one running a business would want his business to expand, which in this day and age means transcending national borders. My opinion is that English is the language of the realm, and even if you name your variables in a different language, there is little to no point to use any non-ASCII characters in your programming. Leave it up to the language to deal with it if you are handling data that is UTF8: my iPhone program (which involves tons of user data going in between the phone and server) has full UTF8 support, but has no UTF8 in the source code. It just seems to open such a large can of worms for almost no benefit.

There is another hazzard to using non-ASCII characters, though it will probably only bite in obscure cases. The allowed characters are defined in terms of the methods Character.isJavaIdentifierStart(int) and Character.isJavaIdentifierPart(int), which are defined in terms of Unicode. However, the exact version of Unicode used depends on the version Java platform, as specified in the documentation for java.lang.Character.

Since character properties change slightly from one Unicode version to the next, it's possible (but probably very unlikely) you could have identifiers that are valid in one version of Java, but not in the next.

As already pointed out, unless method names mostly match the language, it is a bit weird to constantly switch languages while reading.

For the Scandinavian languages & German, which I can speak and thus speak for, I would at least recommend using standard substitutions, ie.

ä/æ -> ae, ö/ø -> oe, å -> aa, ü -> ue

etc. just in case as others may find it difficult to type the original letters without keyboard/keymap changes. Think if you suddenly had to work with a codebase where the developers used a third language (for instance including the French ç) and didn't do this.. Switching between more than 2 keymaps to type efficiently would be painful in my experience.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow