You're mixing up a couple of things:
- keyboard layout is what determines what value get assigned to a scancode.
- localization settings determine in what language you should address the user, and wether the user expects a decimal point or comma.
- character encoding is how a glyph is encoded into the bits memory and, in reverse, how to decode bits into glyphs
If you're validating user input, you shouldn't be interested in scancodes. A DVORAK layout user on a QWERTY keyboard will be pressing the Q key to input an '
. And you shouldn't mess with that. So you have no business dealing with keyboard layouts.
The existence of this keyboard, should remind you, that what keys do is not your head-ache, but up to the user.
The localization settings will matter to you, but not for your regex. They will, however, tell you in what language you should put your error message, in case the user input is invalid. A good coding practice is to use a library like gettext to manage this.
What matters most, when you are validating input. Is just those 2 things: what is valid and what is the input.
You (or your domain expert) decide what is valid. Wether a hyphen-minus is just as acceptable as a hyphen or n-dash.
The input will be in encoded; computers work with bits, not strings of glyphs. It could be ASCII, but I'd steer towards unicode if I could help it.
As for your real concern, if I may rephrase it: "Can all users easily enter '
and -
?". I guess they probably can. Many important programming languages use those glyphs to resp. denote strings and as a subtraction operator. And if your application needs to (dis)allow certain glyphs you can put unicode code points or categories in your regex.