Question

The specification from w3c states the following for forms of enctype=application/x-www-form-urlencoded:

This is the default content type. Forms submitted with this content type must be encoded as follows:

1) Control names and values are escaped. Space characters are replaced by +', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').

2) The control names/values are listed in the order they appear in the document. The name is separated from the value by =' and name/value pairs are separated from each other by&'.

There are a few kinds of line terminators in Unicode. Namely:

 LF:    Line Feed, U+000A
 VT:    Vertical Tab, U+000B
 FF:    Form Feed, U+000C
 CR:    Carriage Return, U+000D
 CR+LF: CR (U+000D) followed by LF (U+000A)
 NEL:   Next Line, U+0085
 LS:    Line Separator, U+2028
 PS:    Paragraph Separator, U+2029

Are all of these converted to CR LF (\r\n)?

Was it helpful?

Solution

Are all of these converted to CR LF (\r\n)?

Nope. The HTML4 spec here is unclear on what a line break is, but what browsers do, and what HTML5 has gone on to standardise is that only CR and LF are involved:

replace every occurrence of a "CR" (U+000D) character not followed by a "LF" (U+000A) character, and every occurrence of a "LF" (U+000A) character not preceded by a "CR" (U+000D) character, by a two-character string consisting of a U+000D CARRIAGE RETURN "CRLF" (U+000A) character pair

(IE doesn't quite conform to this exactly, as it treats LFCR as a single newline. But it's close enough.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top