Question

I am using RE to extract locale from HTTP_ACCEPT_LANGUAGE. The recommended way provided by RailsGuides is:

request.env['HTTP_ACCEPT_LANGUAGE'].scan(/^[a-z]{2}/).first

Unfortunately, this RE doesn't work in many cases, in which the HTTP_ACCEPT_LANGUAGE is something like en-US, zh-TW or zh-CN. Thus I modified the RE:

/^[\w\-\w]{2,5}/

This works. Nonetheless, the IDE gives me a warning: character class has duplicated range: /^[\w\-\w]{2,5}/.

How can I avoid this warning?

Was it helpful?

Solution 4

@npinti and @Victor's answers are good from the perspective of "regex". However, they are not useful enough when the topic is "using RE to extract locale from HTTP_ACCEPT_LANGUAGE in rails". To detect both 2 chars(eg, "en") and 5 chars(eg, "en-US") format properly in rails:

# accept_language should be something like 
# "en-US,en;q=0.8,zh-TW;q=0.6,zh;q=0.4" (from chrome)
# however, it may be nil if the client doesn't set accept language in header.
accept_language = request.env['HTTP_ACCEPT_LANGUAGE'] || ""
# use "match" instead of "scan"!!
match_data = accept_language.match(/^[a-z]{2}(-[A-Z]{2})?/)
I18n.locale = match_data ? match_data[0] : I18n.default_locale

OTHER TIPS

The problem is that within the square brackets, you list any characters you want to match regardless of order, so [\w-\w] is the same as [\w-]. Changing it to something like so should achieve what you are after: \w{2}(-\w{2})?.

For a more stricter control, you can make use of this: ^[a-z]{2}(-[A-Z]{2})?$.

Better use next pattern:

/^[a-z]{2}(-[A-Z]{2})?$/

But commonly lang is a en_US format, _ instead -

Why not just .split(';', 2).first ?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top