What's the text encoding used for header values on HTTP requests?
-
04-06-2021 - |
문제
I have a Ruby on Rails application that is a server for Java and .Net apps. I have a custom header I'm using to send some data, but when this data reaches the Ruby on Rails app, Rails reads the value as UTF-8 then says the value is not a valid UTF-8 string.
For instance, if I send JÜRGENELITE-HP I get:
#<ActiveRecord::StatementInvalid: PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xdc52
: SELECT * FROM "replicas" WHERE ("replicas"."identification" = 'J?RGENELITE-HP') AND ("replicas".user_id = 121) LIMIT 1>
The Java HTTP Client library clearly prints the data correctly in the console:
DEBUG [main] (DefaultClientConnection.java:268) - >> POST /ze/api/files.json HTTP/1.1
DEBUG [main] (DefaultClientConnection.java:271) - >> X-Replica: JÜRGENELITE-HP
DEBUG [main] (DefaultClientConnection.java:271) - >> Authorization: Basic bWxpbmhhcmVzOjEyMzQ1Njc4
DEBUG [main] (DefaultClientConnection.java:271) - >> Content-Length: 0
DEBUG [main] (DefaultClientConnection.java:271) - >> Host: localhost:3000
DEBUG [main] (DefaultClientConnection.java:271) - >> Connection: Keep-Alive
DEBUG [main] (DefaultClientConnection.java:271) - >> User-Agent: Apache-HttpClient/4.1.2 (java 1.5)
But when it reaches Rails it breaks. What encoding does HTTP uses to encode header values?
해결책
US-ASCII
If you look at section 2.2 of RFC2616:
2.2 Basic Rules
The following rules are used throughout this specification to
describe basic parsing constructs. The US-ASCII coded character set
is defined by ANSI X3.4-1986 [21].OCTET = <any 8-bit sequence of data> CHAR = <any US-ASCII character (octets 0 - 127)> UPALPHA = <any US-ASCII uppercase letter "A".."Z"> LOALPHA = <any US-ASCII lowercase letter "a".."z"> ALPHA = UPALPHA | LOALPHA DIGIT = <any US-ASCII digit "0".."9"> CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)> CR = <US-ASCII CR, carriage return (13)> LF = <US-ASCII LF, linefeed (10)> SP = <US-ASCII SP, space (32)> HT = <US-ASCII HT, horizontal-tab (9)> <"> = <US-ASCII double-quote mark (34)>
The remainder of the section has more specific information about headers and other elements of the protocol.
You have to jump around the spec quite a bit to find all of the right BNF definitions. Section 4.2 contains the definition for headers, though:
message-header = field-name ":" [ field-value ] field-name = token field-value = *( field-content | LWS ) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string>
TEXT
is defined back in Section 2.2:
TEXT = <any OCTET except CTLs, but including LWS>