The right way to check of a string has hebrew chars

https://stackoverflow.com//questions/10664254

python
ord

11-12-2019
|

Domanda

The Hebrew language has unicode representation between 1424 and 1514 (or hex 0590 to 05EA).

I'm looking for the right, most efficient and most pythonic way to achieve this.

First I came up with this:

for c in s:
    if ord(c) >= 1424 and ord(c) <= 1514:
        return True
return False

Then I came with a more elegent implementation:

return any(map(lambda c: (ord(c) >= 1424 and ord(c) <= 1514), s))

And maybe:

return any([(ord(c) >= 1424 and ord(c) <= 1514) for c in s])

Which of these are the best? Or i should do it differently?

Soluzione

You could do:

# Python 3.
return any("\u0590" <= c <= "\u05EA" for c in s)
# Python 2.
return any(u"\u0590" <= c <= u"\u05EA" for c in s)

Altri suggerimenti

Your basic options are:

Match against a regex containing the range of characters; or
Iterate over the string, testing for membership of the character in a string or set containing all of your target characters, and break if you find a match.

Only actual testing can show which is going to be faster.

Its simple to check the first character with unidcodedata:

import unicodedata

def is_greek(term):
    return 'GREEK' in unicodedata.name(term.strip()[0])


def is_hebrew(term):
    return 'HEBREW' in unicodedata.name(term.strip()[0])

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow