Match language code with countries where this language is an official or commonly used language

https://stackoverflow.com/questions/2680619

30-09-2019
|

Question

Is there any python library to get a list of countries for a specific language code where it is an official or commonly used language?

For example, language code of "fr" is associated with 29 countries where French is an official language plus 8 countries where it's commonly used.

Solution

pycountry (seriously). You can get it from the Package Index.

OTHER TIPS

Despite the accepted answer, as far as I can tell none of the xml files underlying pycountry contains a way to map languages to countries. It contains lists of languages and their iso codes, and lists of countries and their iso codes, plus other useful stuff, but not that.

Similarly, the Babel package is great but after digging around for a while I couldn't find any way to list all languages for a particular country. The best you can do is the 'most likely' language: https://stackoverflow.com/a/22199367/202168

So I had to get it myself...

def get_territory_languages():
    import lxml
    import urllib

    langxml = urllib.urlopen('http://unicode.org/repos/cldr/trunk/common/supplemental/supplementalData.xml')
    langtree = lxml.etree.XML(langxml.read())

    territory_languages = {}
    for t in langtree.find('territoryInfo').findall('territory'):
        langs = {}
        for l in t.findall('languagePopulation'):
            langs[l.get('type')] = {
                'percent': float(l.get('populationPercent')),
                'official': bool(l.get('officialStatus'))
            }
        territory_languages[t.get('type')] = langs
    return territory_languages

You probably want to store the result of this in a file rather than calling across the web every time you need it.

This dataset contains 'unofficial' languages as well, you may not want to include those, here's some more example code:

TERRITORY_LANGUAGES = get_territory_languages()

def get_official_locale_ids(country_code):
    country_code = country_code.upper()
    langs = TERRITORY_LANGUAGES[country_code].items()
    # most widely-spoken first:
    langs.sort(key=lambda l: l[1]['percent'], reverse=True)
    return [
        '{lang}_{terr}'.format(lang=lang, terr=country_code)
        for lang, spec in langs if spec['official']
    ]

get_official_locale_ids('es')
>>> ['es_ES', 'ca_ES', 'gl_ES', 'eu_ES', 'ast_ES']

Look for the Babel package. It has a pickle file for each supported locale. See the list() function in the localedata module for getting a list of ALL locales. Then write some code to split the locales into (language, country) etc etc

Check out Ethnologue

Be careful though...

India has a lot of official languages.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow