rails - using RE to extract locale from HTTP_ACCEPT_LANGUAGE

Question 1

@npinti and @Victor's answers are good from the perspective of "regex". However, they are not useful enough when the topic is "using RE to extract locale from HTTP_ACCEPT_LANGUAGE in rails". To detect both 2 chars(eg, "en") and 5 chars(eg, "en-US") format properly in rails:

# accept_language should be something like 
# "en-US,en;q=0.8,zh-TW;q=0.6,zh;q=0.4" (from chrome)
# however, it may be nil if the client doesn't set accept language in header.
accept_language = request.env['HTTP_ACCEPT_LANGUAGE'] || ""
# use "match" instead of "scan"!!
match_data = accept_language.match(/^[a-z]{2}(-[A-Z]{2})?/)
I18n.locale = match_data ? match_data[0] : I18n.default_locale

Question 2

The problem is that within the square brackets, you list any characters you want to match regardless of order, so [\w-\w] is the same as [\w-]. Changing it to something like so should achieve what you are after: \w{2}(-\w{2})?.

For a more stricter control, you can make use of this: ^[a-z]{2}(-[A-Z]{2})?$.

Question 3

Better use next pattern:

/^[a-z]{2}(-[A-Z]{2})?$/

But commonly lang is a en_US format, _ instead -

Question 4

Why not just .split(';', 2).first ?