I came up against a problem today whilst trying to submit a request to a remote API. The header needed to contain:
'Content-MD5' : "md5here"
But the urllib2 Request() forces capitalize() on all header names, and transformed it into "Content-Md5", which in turn made the remote web server ignore the header and break the request (as the remote side is case sensitive, of which we don't have any control over).
I attempted to get smart by using the following patch:
class _str(str):
def capitalize(s):
print s
return s
_headers = {_str("Content-MD5") : 'md5here'}
But this failed to work:
---HEADERS---
{'Content-MD5': 'nts0yj7AdzJALyNOxafDyA=='}
---URLLIB2 DEBUG---
send: 'POST /api/v1 m HTTP/1.1\r\nContent-Md5: nts0yj7AdzJALyNOxafDyA==\r\n\r\n\r\n'
Upon inspecting the urllib2.py source, I found 3 references to capitalize() which seem to cause this problem, but it seems impossible to monkey patch, nor fix without forking.
Therefore, I'd like to +1 a feature request to have an extra option at the time of the request being opened, to bypass the capitalize() on header names (maybe, header_keep_original = True or something).
And, if anyone could suggest a possible monkey patch (which doesn't involve forking huge chunks of code), that'd be good too :)
Thanks
Cal

Well, three occurrences means you only have three methods to patch (and two of them are trivial). But I agree that copying the non-trivial method doesn't look fun from a maintenance perspective.
You could also try using an object that is not a subclass of str. The problem with subclassing str is that some (most?) string methods do not do a subclass check but directly call the C implementation of the method. I think there's an issue in the tracker somewhere about that.
The problem with not subclassing string, of course, is that you may end up implementing a lot of methods on your object to get it to play nicely with urllib2's assumption that it *is* a string.

Sorry, I should clarify.. The str() patch worked, but it failed to work within the realm of urllib2:
s = _str("Content-MD5")
print "Builtin:"
print "plain: %s" % ( s )
print "capitalized: %s" % ( s.capitalize() )
s = str("Content-MD5")
print "Builtin:"
print "plain: %s" % ( s )
print "capitalized: %s" % ( s.capitalize() )
Builtin:
plain: Content-MD5
capitalized: Content-MD5
Builtin:
plain: Content-MD5
capitalized: Content-md5
Why it works in the unit test, and not within urllib2, is totally beyond me. Especially since I put a debug call on the method, and it does get called.. yet urllib2 debug still shows it sending the wrong value.
---
capitalize() bypassed: sending value: Content-MD5
send: 'POST /api/url\r\nContent-Md5: nts0yj7AdzJALyNOxafDyA==\r\n\r\n'
---
I have a feeling that the problem may lie somewhere after the opener (like HTTPConnection or AbstractHTTPHandler), rather than the urllib2 calls to capitalize(), but not having much luck monkey patching those :X

So @r.david.murray, it would appear you were right :D Really, I should have looped through each method on str(), and wrapped them all to see which were being called, but lesson learned I guess.
Sooo, I guess now the question is, can we possibly get a vote on having a feature which disables this functionality from the opener level. Something like:
opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1, keep_original_header_case=True))
But obviously a less tedious attribute name :)
In the mean times, if anyone else comes up against this problem, the code I pasted above will work fine for now.
Cal

Quoting http://tools.ietf.org/html/rfc2068#section-4.2:
Field names are case-insensitive.
Which is only logical, since they are modeled on email headers, and email header names are case insensitive. So, the server in question is broken, yes, but that doesn't mean we can't provide a facility to allow Python to inter-operate with it. Email, for example, preserves the case of the field names it parses or receives from the application program, but otherwise treats them case-insensitively. However, since the current code coerces to title case, we have to provide this feature as a switchable facility defaulting to the current behavior, for backward compatibility reasons.
And someone needs to write a patch....

Thats full understandable that the default won't change. I'll put this in my
todo list to write a patch in a week or two.
On 1 Jul 2011 08:45, "R. David Murray" <report@bugs.python.org> wrote:
>
> R. David Murray <rdmurray@bitdance.com> added the comment:
>
> Quoting http://tools.ietf.org/html/rfc2068#section-4.2:
>
> Field names are case-insensitive.
>
> Which is only logical, since they are modeled on email headers, and email
header names are case insensitive. So, the server in question is broken,
yes, but that doesn't mean we can't provide a facility to allow Python to
inter-operate with it. Email, for example, preserves the case of the field
names it parses or receives from the application program, but otherwise
treats them case-insensitively. However, since the current code coerces to
title case, we have to provide this feature as a switchable facility
defaulting to the current behavior, for backward compatibility reasons.
>
> And someone needs to write a patch....
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue12455>
> _______________________________________

The comment about urllib.request forcing .title() is consistent with 'Content-Length' and 'Content-Type' in the docs but puzzling and inconsistent given that in 3.3, header names are printed .capitalize()'ed and not .title()'ed and that has_header and get_header *require* the .capitalize() form and reject the .title() form.
import urllib.request
opener = urllib.request.build_opener()
request = urllib.request.Request("http://example.com/", headers =
{"Content-Type": "application/x-www-form-urlencoded"})
opener.open(request, "1".encode("us-ascii"))
print(request.header_items(),
request.has_header("Content-Type"),
request.has_header("Content-type"),
request.get_header("Content-Type"),
request.get_header("Content-type"), sep='\n')
>>>
[('Content-type', 'application/x-www-form-urlencoded'), ('Content-length', '1'), ('User-agent', 'Python-urllib/3.3'), ('Host', 'example.com')]
False
True
None
application/x-www-form-urlencoded
Did .title in 2.7 urllib2 request get changed to .capitalize in 3.x urllib.request (without the examples in the doc being changed) or is request inconsistent within itself?
Cal did not the 2.7 code exhibiting the problme, but when I add this code in 3.3, the output start as shown.
request.add_header('Content-MD5', 'xxx')
print(request.header_items())
#
[('Content-md5', 'xxx'), ...
So is 3.3 sending 'Content-Md5' or 'Content-md5'
My guess is the former, as urllib.request has the same single use of .title in .do_open as Cal quoted. The two files also have the same three uses of .capitalize in .add_header, .add_unredirected_header, and .do_request. So it seems that header names are normalized to .capitalize on entry and .title on sending, or something like that. Ugh. Is there any good justification for this?
I do not see anything in the doc about headers names being normalized either way or about the requirements of has_/get_header. If the behavior were consistent and the same since forever, then I would say the current docs should be improved and a change would be an enhancement request. Since the behavior seems inconsistent, I am more inclined to think there is a bug.
I realize that this message expands the scope of the issue, but it is all about the handing of header names in requests.

Note that HTTP header fields are case-insensitive.
See http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging#section-3.2
Each HTTP header field consists of a case-insensitive field name
followed by a colon (":"), optional whitespace, and the field value.
Basically the author of a request can set them to whatever he/she wants. But we should, IMHO, respect the author intent. It might happen that someone will choose a specific combination of casing to deal with broken servers and/or proxies. So a cycle of set/get/send should not modify at all the headers.

Mark,
I'm happy to followup.
I will be in favor of removing any capitalization and not to change headers whatever they are. Because it doesn't matter per spec. Browsers do not care about the capitalization. And I haven't identified Web Compatibility issues regarding the capitalization.
That said, it seems that Cal msg139512 had an issue, I would love to know which server/API had this behavior to fill a but at http://webcompat.com/
So…
Where do we stand? Feature or removing anything which modifies the capitalization of headers?