Question

I want to count the number of characters in a textfield on my website. The textfield accepts any type of input from a user, including ascii art and other special characters. If the user types in normal characters, I can use strlen($message) to return the value, but if the user uses special characters (such as  or ©), the count is incorrect.

Is there are simple way to count everything without having to do any heavy lifting?

Was it helpful?

Solution

If your input is UTF-8 encoded and you want to count Unicode graphemes, you can do this:

$count = preg_match_all('/\X/u', $text);

Here is some explanation. Unicode graphemes are "characters" (Unicode codepoints), including the "combining marks" that can follow them.

mb_strlen($text, 'UTF-8') would count combining marks as separate characters (and strlen($text) would give you the total bytecount).

Since, judging by a comment of yours, your input could have some characters converted to their HTML entity equivalent, you should first do an html_entity_decode():

$count = preg_match_all('/\X/u', html_entity_decode($text, ENT_QUOTES, 'UTF-8'));

UPDATE

The intl PECL extension now provides grapheme_strlen() and other grapheme_*() functions (but only if you have the intl PECL extension installed, of course).

OTHER TIPS

Both strlen & mb_strlen are working fine for me.

Either the special characters entered may wont show (Unicode). So try which are the characters that are not readable.

Hope this helps you.

Here you go.

function countumlauts($str) {
    return strlen($str) - iconv_strlen($str);
}

How it works: Special chars use more than one byte. strlen counts the bytes, while iconv_strlen counts the chars.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top