How to check if the word is Japanese or English using PHP

https://stackoverflow.com/questions/2856942

27-09-2019
|

문제

I want to have different process for English word and Japanese word in this function

function process_word($word) {
   if($word is english) {
     /////////
   }else if($word is japanese) {
      ////////
   }
}

thank you

해결책

A quick solution that doesn't need the mb_string extension:

if (strlen($str) != strlen(utf8_decode($str))) {
    // $str uses multi-byte chars (isn't English)
}

else {
    // $str is ASCII (probably English)
}

Or a modification of the solution provided by @Alexander Konstantinov:

function isKanji($str) {
    return preg_match('/[\x{4E00}-\x{9FBF}]/u', $str) > 0;
}

function isHiragana($str) {
    return preg_match('/[\x{3040}-\x{309F}]/u', $str) > 0;
}

function isKatakana($str) {
    return preg_match('/[\x{30A0}-\x{30FF}]/u', $str) > 0;
}

function isJapanese($str) {
    return isKanji($str) || isHiragana($str) || isKatakana($str);
}

다른 팁

This function checks whether a word contains at least one Japanese letter (I found unicode range for Japanese letters in Wikipedia).

function isJapanese($word) {
    return preg_match('/[\x{4E00}-\x{9FBF}\x{3040}-\x{309F}\x{30A0}-\x{30FF}]/u', $word);
}

You could try Google's Translation API that has a detection function: http://code.google.com/apis/language/translate/v2/using_rest.html#detect-language

Try with mb_detect_encoding function, if encoding is EUC-JP or UTF-8 / UTF-16 it can be japanese, otherwise english. The better is if you can ensure which encoding each language, as UTF encodings can be used for many languages

English text usually consists only of ASCII characters (or better say, characters in ASCII range).

You can try to convert the charset and check if it succeeds.

Take a look at iconv: http://www.php.net/manual/en/function.iconv.php

If you can convert a string to ISO-8859-1 it might be english, if you can convert to iso-2022-jp it is propably japanese (I might be wrong for the exact charsets, you should google for them).

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow