Question

I have an array of countries with one having a Latin character "Å":

$country["af"] = "Afghanistan";
$country["ax"] = "Åland Islands";
$country["al"] = "Albania";

While looping through this array and performing a comparison of the first character of the country name, I cannot match the Latin character.

foreach($country as $cc => $name)
{
 if($name[0] == "Å")
 {
  echo "matched";
 }
 else
 {
  echo $name[0];
 }
}

The result I got is: A�A

Why does the Latin character Å became � and how do I perform a proper comparison and output the Latin character Å?

Add Note: The http header and the html document have already been specified as UTF-8 format.

Add Note2: If I just echo $name instead of $name[0], I am able to get the Å in Åland Islands. Using substr($name, 0, 1) has the same effect as $name[0], which gives me �.

Was it helpful?

Solution

Change your script to this. The unicode encoding words cannot explode with normal string functions. You have to use multibyte functions.

foreach($country as $cc => $name)
{
     if(mb_substr($name,0,1,"UTF-8") == "Å")
     {
      echo "matched";
     }
     else
     {
      echo mb_substr($name,0,1,"UTF-8");
     }
}

OTHER TIPS

The problem is that programs have different ways of representing different characters. This is referred to as character encoding. Your browser, server, and PHP code are currently confused about which encoding you are using because you are mixing UTF-8 characters with ANSI code.

You can learn more about encoding here: http://vlaurie.com/computers2/Articles/characters.htm

There are three things that I do whenever I build a UTF-8 PHP site. These three things should resolve your problem:

Add a PHP UTF-8 Header

Add this to the top of your code:

<?php
header('Content-Type: text/html; charset=utf-8'); 
...

I believe that this instructs other servers and your browser to parse this document using UTF-8, instead of ANSI. You can read more about this here: Set HTTP header to UTF-8 using PHP

Add HTML UTF-8 Meta Tags

Add this code to the top of the HTML that you return:

<!doctype html>
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" /> 
...

This also instructs your browser to read the characters in UTF-8 (instead of ANSI). You can read more about this here: Set HTTP header to UTF-8 using PHP

Save the PHP File as UTF-8 without BOM

By default, your files usually save in ANSI encoding. If you want to work with international characters, then you need to save them inUTF-8encoding. This will let you work with theÅ` character properly.

If you are Notepad++ as your Text Editor, then you can set the encoding of your document under the Encoding menu. Set it to Encode in UTF-8 without BOM.

Gotcha

UTF-8 without BOM is not the same thing as UTF-8. UTF-8 files are often prepended with 3 bytes of data that indicate that the file is a UTF-8 file. This is referred to as the Byte Order Mark (BOM). You can read more about the BOM here: http://www.arclab.com/products/amlc/utf-8-php-cannot-modify-header-information.html

Most programs can tell that the file is UTF-8 anyway, so the BOM is redundant. If you don't save without the BOM, you'll probably get an error message like this:

Warning: Cannot modify header information – headers already sent

If you see this error message, then you probably have a BOM problem.

The Question mark is because your viewer (browser) is trying to display a character that is not supported in the current character set. Why this is happening on accessing the first character with $name[0] I'm not sure.

Based on the post here: PHP: Convert specific-Bosnian characters to non-bosnian (utf8 standard chars)

I tried the following:

$result = iconv("UTF-8", "ASCII//TRANSLIT", $test);

$result now contains Aland Islands, the special characters are converted to their normal version.

$result[0] should now contain A.

Please set character encoding for file (stored code) and output

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top