Description

The ​WSGI spec requires the standard environment values to be str, not unicode types, but the WSGIRequest object updates the environ (via self.META which is a reference), setting PATH_INFO and SCRIPT_NAME to unicode objects. These unicode values led to some issues with ​WebTest which checks to ensure that the environ only contains str values.

Change History (14)

FWIW, it looks like that WSGI for Python 3.0, if specification is ever actually updated, will have WSGI environment variables be unicode, ie., Python 3.0 strings and not byte strings, which would have been the proper equivalent to Python 2.X strings.

FWIW, it looks like that WSGI for Python 3.0, if specification is ever actually updated, will have WSGI environment variables be unicode, ie., Python 3.0 strings and not byte strings, which would have been the proper equivalent to Python 2.X strings.

Yes, but only by decoding as "latin-1" which means it's still 1 character per byte. Django could re-decode these values as utf-8, but it would not be valid to put them into the WSGI environ as such since they could contain code points above \uFF which can not be encoded as latin-1.

I don't use Django or even know much about Django code internals, so I wouldn't like to be saying it is okay or not. In other words, I might be able to comment on WSGI and web hosting mechanisms, but far from being qualified to comment about Django itself. :-)

Ok, fair enough. I'll make the call that Django should be following wsgi spec regarding unicode vs string. Malcolm would probably know if there's some reason why it shouldn't, so I'm ccing him in for an opinion.

The WSGI spec says all strings should be of the str type and be encoded as ISO-8859-1(latin-1) or be MIME encoded according to RFC 2047. When dealing with the PATH_INFO and SCRIPT_NAME environ variables I think it's to safe to say we don't want to put MIME encoded data in them. Which leave us with the latin-1 encoding. We can always explicitly decode those variables into latin-1 to ensure we follow the spec. Doing so should also handle the case of being handed unicode data, we'll simple re-encode it with undefined results (which is also in the WSGI spec). I'll attach a patch that demonstrates this.