We are in the process of migrating from PHP 5.2 to PHP 5.3 and have run into issues with gettext in the Windows versions of PHP greater than 5.3. gettext seems to return data as UTF-8 and calls to bind_textdomain_codeset() have no effect when we try to change the character set.
See the following script:
<?php
print 'PHP_OS: ' . PHP_OS . "\n";
print 'php_version: ' . phpversion() . "\n\n";
$language = 'de_DE';
putenv( "LANG=$language" );
setlocale( LC_ALL, $language );
if ( strtoupper( substr( PHP_OS, 0, 3 ) ) === 'WIN' ) {
$language = 'german';
setlocale( LC_ALL, $language );
}
bindtextdomain( 'messages', dirname(__FILE__) . '/language' );
textdomain( 'messages' );
$translated = _( 'Overtime' );
printf ("Default encoding: %X %X\n", ord($translated[0]), ord($translated[1]));
bind_textdomain_codeset('messages', 'ISO-8859-1');
$translated = _( 'Overtime' );
printf ("Encoding set to ISO-8859-1: %X %X\n", ord($translated[0]), ord($translated[1]));
bind_textdomain_codeset('messages', 'UTF-8');
$translated = _( 'Overtime' );
printf ("Encoding set to UTF-8: %X %X\n", ord($translated[0]), ord($translated[1]));
local directory structure for languages files:
language
de
LC_MESSAGES
messages.mo
messages.mo contains a single message translating "Overtime" to "Überstunden"
Results for PHP5.2.9-2 under windows and PHP5.3.27 under Linux are as expected (0xDC is ISO-8859-1 U with umlaut, 0x62 is ISO-8859-1 b, 0xC39C is UTF-8 U with umlaut)
PHP_OS: WINNT
php_version: 5.2.9-2
Default encoding: DC 62
Encoding set to ISO-8859-1: DC 62
Encoding set to UTF-8: C3 9C
---------------------------------
PHP_OS: Linux
php_version: 5.3.27
Default encoding: DC 62
Encoding set to ISO-8859-1: DC 62
Encoding set to UTF-8: C3 9C
However under windows version of PHP 5.3 (tested with 5.4 also), result is not as expected:
PHP_OS: WINNT
php_version: 5.3.28
Default encoding: C3 9C
Encoding set to ISO-8859-1: C3 9C
Encoding set to UTF-8: C3 9C
Output is UTF-8 by default and can't be changed by bind_textdomain_codeset.
We're currently using gettext and ISO-8859-1 throughout our app and want to run unit tests under Windows, but the Windows versions of PHP 5.3 and greater seem to be broken with respect to the encoding returned by gettext.