What's the text encoding used for header values on HTTP requests?

https://stackoverflow.com/questions/10356216

04-06-2021
|

문제

I have a Ruby on Rails application that is a server for Java and .Net apps. I have a custom header I'm using to send some data, but when this data reaches the Ruby on Rails app, Rails reads the value as UTF-8 then says the value is not a valid UTF-8 string.

For instance, if I send JÜRGENELITE-HP I get:

#<ActiveRecord::StatementInvalid: PGError: ERROR:  invalid byte sequence for encoding "UTF8": 0xdc52
: SELECT * FROM "replicas" WHERE ("replicas"."identification" = 'J?RGENELITE-HP') AND ("replicas".user_id = 121)  LIMIT 1>

The Java HTTP Client library clearly prints the data correctly in the console:

DEBUG [main] (DefaultClientConnection.java:268) - >> POST /ze/api/files.json HTTP/1.1
DEBUG [main] (DefaultClientConnection.java:271) - >> X-Replica: JÜRGENELITE-HP
DEBUG [main] (DefaultClientConnection.java:271) - >> Authorization: Basic bWxpbmhhcmVzOjEyMzQ1Njc4

DEBUG [main] (DefaultClientConnection.java:271) - >> Content-Length: 0
DEBUG [main] (DefaultClientConnection.java:271) - >> Host: localhost:3000
DEBUG [main] (DefaultClientConnection.java:271) - >> Connection: Keep-Alive
DEBUG [main] (DefaultClientConnection.java:271) - >> User-Agent: Apache-HttpClient/4.1.2 (java 1.5)

But when it reaches Rails it breaks. What encoding does HTTP uses to encode header values?

해결책

US-ASCII

If you look at section 2.2 of RFC2616:

2.2 Basic Rules

The following rules are used throughout this specification to
describe basic parsing constructs. The US-ASCII coded character set
is defined by ANSI X3.4-1986 [21].

   OCTET          = <any 8-bit sequence of data>
   CHAR           = <any US-ASCII character (octets 0 - 127)>
   UPALPHA        = <any US-ASCII uppercase letter "A".."Z">
   LOALPHA        = <any US-ASCII lowercase letter "a".."z">
   ALPHA          = UPALPHA | LOALPHA
   DIGIT          = <any US-ASCII digit "0".."9">
   CTL            = <any US-ASCII control character
                    (octets 0 - 31) and DEL (127)>
   CR             = <US-ASCII CR, carriage return (13)>
   LF             = <US-ASCII LF, linefeed (10)>
   SP             = <US-ASCII SP, space (32)>
   HT             = <US-ASCII HT, horizontal-tab (9)>
   <">            = <US-ASCII double-quote mark (34)>

The remainder of the section has more specific information about headers and other elements of the protocol.

You have to jump around the spec quite a bit to find all of the right BNF definitions. Section 4.2 contains the definition for headers, though:

   message-header = field-name ":" [ field-value ]
   field-name     = token
   field-value    = *( field-content | LWS )
   field-content  = <the OCTETs making up the field-value
                    and consisting of either *TEXT or combinations
                    of token, separators, and quoted-string>

TEXT is defined back in Section 2.2:

   TEXT           = <any OCTET except CTLs,
                    but including LWS>

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow