I’m writing some code that sets cookies and I’m wondering about the exact semantics of the Set-Cookie
header. Imagine the following HTTP header line:
Set-Cookie: name=value; Path=/%20
For with path does this set the cookie? /
or /%20
(unescaped) (/%20
or /%2520
escaped)?
The reason I’m asking is that I should support non-ASCII paths. Since HTTP header must only be ASCII my plan was to URL escape the path value but the HTTP specification is not as clear as I’d hoped for.
Edit
I know what Path is supposed to do. My question is: Is the value interpreted as percent encoded or not?
The standard to support IRIs states the character encoding for such URL must be UTF-8. It also states in 4.1.2.4 in HTTP State Management Mechanism that the path is that of the directory part of the URI (i.e. the encoded IRI). Eg. the directory ‘/å/’ has Cookie path ‘/%C3%A5/’ even though the response header states the content-type is ISO-8859-1.
%20 is a encoded space in UTF-8 charset, thus the cookie-path ‘/%20’ is slash-space.
Syntax
setcookie(name,value,expire,path,domain,secure)
path
Optional. Specifies the server path of the cookie
If set to “/”, the cookie will be available within the entire domain. If set to “/test/”, the cookie will only be available within the test directory and all sub-directories of test. The default value is the current directory that the cookie is being set in.
Source: PHP setcookie() Function
There is also a great existing SO answer around allowed characters in cookies (it talks about ASCII characters)
What that document doesn’t remember to say, because Netscape were terrible at writing specs, was that control characters (x00 to x1F plus x7F) aren’t allowed, and support for non-ASCII characters is left unspecified.
What browsers do:
- in Opera and Google Chrome, non-ASCII characters are encoded into cookies with UTF-8;
- in IE, the machine’s default code page is used (locale-specific and never UTF-8);
- Firefox (and other Mozilla-based browsers) use the low byte of each UTF-16 code point on its own (so ISO-8859-1 is OK but anything else is mangled);
- Safari simply refuses to send any cookie containing non-ASCII characters.
So in practice you cannot use non-ASCII characters in cookies at all. If you want to use Unicode, control codes or other arbitrary byte sequences you must use an ad-hoc encoding scheme of your own choosing. Most popular is UTF-8-inside-URL-encoding (as produced by JavaScript’s encodeURIComponent).
1
My intuition says that the path is an URL-encoded value (since it is a URL).
Why not test it? I created the following nginx config:
add_header Set-Cookie "k=v; path=/";
add_header Set-Cookie "f%2520o=ba%2520r; path=/qa%20x";
add_header Set-Cookie "f%2520o=ba%2520r; path=/qb%25x";
add_header Set-Cookie "Z%2520Z=Z%2520Z; path=/";
location ~ /q.* {
try_files $uri /cookie.html;
}
The index page:
<iframe src="/qa%20x"></iframe>
<iframe src="/qa x"></iframe>
<iframe src="/qb%25x"></iframe>
<iframe src="/qb%2520x"></iframe>
<iframe src="/qb%20x"></iframe>
The cookie page is essentially writing document.cookie
. Conclusion: the path is interpreted as a percent-encoded sequence (with automatic conversion of a space to %20
as needed). I have tested it in Firefox and Chromium, both browsers show the same behavior. Guess which browser is an exception of this… IE8 did not show the cookies from the percent-encoded path.
I did more tests, and it turned out that IE ignored cookies with path /foo
for a literal request to /foo
. A request for /foo/
(with Set-Cookie: key=val; path=/foo
) works though. Even with percent-encoded values.