PHP convert characters applicable for title tag

https://stackoverflow.com/questions/22303135

12-06-2023
|

質問

In my page I convert lower to uppercase string and output 'em in the title tag. First I had the issue that &NBSP; is not accepted, so I had to preserve entities.

So I converted them to unicode, then uppercase and then back to htmlentities:

echo htmlentities(strtoupper(html_entity_decode(ob_get_clean())));

Now I have the problem that I recognized related to a "right single quote". I'm getting this character as ’ in the title.

It seems that either of the two functions I'm using does not convert them correctly. Is there any better function that I can use or is there something especially for the title tag?

Edit: Here is a var_dump of the original data which I don't have influence to:

string(74) "Example example example &raquo; John Doe- Who&#8217;s That?&nbsp;"

Edit II: This is what my code above results in:

This would happen, if I would just use strtoupper:

解決

Your problem is that strtoupper will destroy your UTF-8 entity-decoded input because it is not multibyte aware. In this instance, ’ decodes to the hex-encoded UTF-8 sequence e2 80 99. But in strtoupper's single-byte world, the character with code \xe2 is â, which is converted to Â (\xc2) -- which makes your text an invalid UTF-8 sequence.

Simply use mb_strtoupper instead.

他のヒント

It's ugly, but it might work for you (although I would certainly suggest Jon's solution):

After your strtoupper(), you can replace all uppercased HTMLentities this way:

$entity_table = get_html_translation_table(HTML_ENTITIES);
$entity_table_uc = array_map('strtoupper', $entity_table);
$string = str_replace($entity_table_uc, $entity_table, $string);

This should remove the need for htmlentities() / html_entity_decode().

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow