Question

I'm looking for a little help in understanding how charsets work. This question is a continuation from Anything wrong with using windows-1252 instead of UTF-8

I have a test ColdFusion site using...

<CFHEADER NAME="Content-Type" value="text/html; charset=windows-1252">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />

and a test Oracle DB using...

NLS_CHARACTERSET: WE8MSWIN1252
NLS_NCHAR_CHARACTERSET: AL16UTF16

According to the windows-1252 charset there is no square root symbol (alt+251): √ But I can type that into a field on a webpage form, save it to the DB, query it and show it on the screen again just fine. When it's in the DB it's stored as: &#8730;. How can I enter that, store it, query and show it if it's not even part of the charset? According to the charset, decimal 251 is this: Hex:FB | û | 00FB | LATIN SMALL LETTER U WITH CIRCUMFLEX

Was it helpful?

Solution

You're not really using characters outside of the page and database's charset.

Because the page is windows-1252 encoded, if you enter Alt+251 into a form field and then post the data, the browser says:

"Hey this char is not apart of windows-1252 and I need to only send back data
 which is in windows-1252, so I will do the best I can and send back the 
 html character code of char &#8730;  -- oh well, I wish I could send back
 1 character, since I cannot I will send back 7."

And if you notice, this is 7 different characters which are in the windows-1252 charset.

Had the page been encoded with a multibyte charset, the browser would send back something which is considered 1 character.

So how can you query it?

 select * from tab where field like '%&#8730;%'

What you have is the html character of the square root symbol: https://www.google.com/#q=html+character+codes

Update:

Here is a very good article explaining what is happening: http://htmlpurifier.org/docs/enduser-utf8.html

 "...once you start adding characters outside of your encoding... 
 [the browser might] replace the character with a character entity reference...."

Also when you enter Alt+251 on a windows machine, it inserts the square root symbol which in Unicode it is U-221A.

Pressing Alt+251 is just a like a keyboard macro to insert Unicode it is U-221A.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top