Pregunta

When working in the Moovweb SDK, length("çãêá") is expected to return 4, but instead returns 8. How can I ensure that the length function works correctly when using Unicode characters?

¿Fue útil?

Solución

This is a common issue with Unicode characters and the length() function using the wrong character set. To fix it you need to set the charset_determined variable to make sure the correct character set is being used before making the call to length(), like so in your tritium code:

$charset_determined = "utf-8"
# your call to length() here

Otros consejos

In Unicode, there is no such thing as a length of a string or "number of characters". All this comes from ASCII thinking.

You can choose from one of the following, depending what you exactly need:

  • For cursor movement, text selection and alike, grapheme clusters shall be used.

  • For limiting the length of a string in input fields, file formats, protocols, or databases, the length is measured in code units of some predetermined encoding. The reason is that any length limit is derived from the fixed amount of memory allocated for the string at a lower level, be it in memory, disk or in a particular data structure.

The size of the string as it appears on the screen is unrelated to the number of code points in the string. One has to communicate with the rendering engine for this. Code points do not occupy one column even in monospace fonts and terminals. POSIX takes this into account.

There is more info in http://utf8everywhere.org

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top