Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

I started work on this in a local branch. Patches for the changes made there can be found here:
http://people.apache.org/~jboynes/patches/
Of these, patches 01 to 12 have been applied.

There is substantial refactoring in there to simply the current implementation. Actual changes are:

...

indent
The value (which is a UCS-16 Java String) will be encoded using UTF-8 when being added to the header. Application impact is that non-ASCII characters will no longer cause an IAE. For V0 cookies, this is an extension to RFC6265 required to support HTML-5. V1 cookies already allow 8-bit characters if quoted and this is likely to be needed to avoid an IAE as the value would still be validated; it would be the application's responsibility to quote the value.

...

  • *
indent


_kkolinko_: Using UTF-8 in HTTP headers is not allowed by RFC 2616. On page 32 it says:


Wiki Markup
 {{message-header = field-name ":" \[field-value \]}}

field-value = *( field-content | LWS )

field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string>

The tokens are US-ASCII (0-127 minus CTLs or separators) (pages 16-17).

Wiki Markup
 The TEXT is defined on page 16 where it says: "Words of \*TEXT MAY contain characters from character sets other than ISO-8859-1 \[22\] only when encoded according to the rules of RFC 2047 \[14\]."

The quoted-string is TEXT in double quotes (page 16).

  • *
indent


_kkolinko_: Javadoc for [HttpServletResponse].setHeader() method also mentions that the value of a header should be encoded according to RFC 2047. http://www.ietf.org/rfc/rfc2047.txt

G5 Validate domain per RFC6265::

  • *
indent

The domain will now be validated per RFC1034 rather than simply as a value. Application impact is that an invalid domain will now The domain will now be validated per RFC1034 rather than simply as a value. Application impact is that an invalid domain will now raise an IAE rather than be rejected by the browser. No semantic validation (e.g. number of dots) will be performed. A valid domain name is a "token" and so no quotation would be needed.

...

indent
Stop modifying the header in-situ as part of the de-escaping process ([Bug 57896|https://bz.apache.org/bugzilla/show_bug.cgi?id=57896]) so that an application can elect to perform its own parsing by calling getHeader("Cookie"). Eliminate the need for the PRESERVE_COOKIE_HEADER property that currently controls whether a copy of the header is made if modifications are needed. Perform de-escaping during the copy needed to convert the MessageBytes to the String in Cookie#value, possibly during any conversation process needed to handle UTF-8.

Impact of proposal on existing issues

Issue

Impact

Bug 55917

Parsing will no longer cause an IAE. 8-bit values will be interpreted as a UTF-8 value and the cookie would be dropped if they are not a valid encoding.

Bug 55918

The cookie would be dropped rather than accepted.

Bug 55920

Valid values would be round tripped including quotes supplied by the application. Attempts to set invalid values would result in a IAE from addCookie. Invalid values sent by the browser would result in the cookie being ignored.

Bug 55921

Attempts to set a cookie containing raw JSON would results in an IAE due to the DQUOTE characters. A cookie sent from the browser containing JSON would be accepted although any semicolons in the data would result in early termination (note, browsers other than Safari do not allow semicolons in values anyway).

...

Issue

Current behaviour (8.0.0-RC10/7.0.50)

Proposed new behaviour

Servlet + Netscape + RFC2109

Servlet + RFC 6265

0x80 to 0xFF in cookie value (Bug 55917)

IAE

TBD

Netscape yes. RFC2109 requires quotes.

RFC 6265 never allowed.

CTL allowed in quoted cookie values (Bug 55918)

Allowed

TBD

Not allowed.

Not allowed.

Quoted values in V0 cookies (Bug 55920)

Quotes removed.

TBD

Netscape - quotes are part of value.

Quotes are not part of value.

Raw JSON in cookie values (Bug 55921)

TBD

TBD

TBD

TBD

Allow equals in value

Not by default. Allowed if property set.

TBD

Netscape is ambiguous. RFC2109 requires quoting.

Allowed.

Allow separators in V0 names and values

Not by default. Allowed if property set.

TBD

Yes except semi-colon, comma and whitespace.

Never in names. Yes in values except semi-colon, comma and whitespace, double-quote and backslash.

Always add expires

Enabled by default. Disabled by property.

TBD

Netsacpe uses expires. RFC2109 uses Max-Age.

Allows either, none or both.

/ is separator

Enabled by default. Disabled by property.

TBD

Netscape allowed in names and values. RFC2109 allowed in values if quoted.

Allowed in values.

Strict naming (as per Servlet spec)

Enabled by default. Disabled by property.

TBD

Netscape allows names the Servlet spec does not. RFC2109 is consistent with the Servlet spec.

Consistent with the Servlet spec.

Allow name only

Disabled by default. Enabled by property.

TBD

Netscape allowed and equals sign expected before empty value. RFC2109 not allowed.

Allowed but equals sign required before empty value.

Issues to add to the table above

  • Bug 55951 regarding UTF-8 encoded values from HTML5
  • Any further issues raised on mailing lists

...