Some comments on cgi_diff_20110109.txt, especially on FieldStorage
constructor.
Le dimanche 09 janvier 2011 13:26:24, vous avez écrit :
> Here is the diff file for the revised version of cgi.py
+ import msvcrt
+ msvcrt.setmode (0, os.O_BINARY) # stdin = 0
+ msvcrt.setmode (1, os.O_BINARY) # stdout = 1
+ msvcrt.setmode (2, os.O_BINARY) # stderr = 2
Why do you change stdout and stderr mode? Is it needed? Instead of 0, you
should use sys.stdin.fileno() with a try/except on .fileno() because stdin can
be a StringIO object:
>>> o=io.StringIO()
>>> o.fileno()
io.UnsupportedOperation: fileno
I suppose that it's better to do nothing if sys.stdin has no .fileno() method.
More generally, I don't think that the cgi module should touch sys.stdin mode:
it impacts the whole process, not only the cgi module. Eg. change sys.stdin
mode in Python 3.1 will break the interperter because the Python parser in
Pytohn 3.1 doesn't know how to handle \r\n end of line. If you need binary
stdin, I should backport my patch for #10841 (for std*, FileIO and the
parser).
----
def __init__(self, fp=None, headers=None, outerboundary="",
environ=os.environ, keep_blank_values=0, strict_parsing=0,
limit=None):
...
if 'QUERY_STRING' in environ:
qs = environ['QUERY_STRING']
elif sys.argv[1:]:
qs = sys.argv[1]
else:
qs = ""
fp = BytesIO(qs.encode('ascii')) # bytes
----
With Python 3.2, you should use environ=environ.os.environb by default to
avoid unnecessary conversion (os.environb --decode--> os.environ --encode-->
qs). To decode sys.argv, ASCII is not the right encoding: you should use
qs.encode(locale.getpreferredencoding(), 'surrogateescape') because Python
decodes the environment and the command line arguments from
locale.getpreferredencoding()+'surrogateescape', so it is the exact reverse
operation and you get the original raw bytes.
For Python 3.1, use also qs.encode(locale.getpreferredencoding(),
'surrogateescape') to encode the environment variable.
So for Python 3.2, it becomes something like:
----
def __init__(self, fp=None, headers=None, outerboundary="",
environ=os.environb, keep_blank_values=0, strict_parsing=0,
limit=None):
...
if 'QUERY_STRING' in environ:
qs = environ[b'QUERY_STRING']
elif sys.argv[1:]:
qs = sys.argv[1]
else:
qs = b""
if isinstance(qs, str):
encoding = locale.getpreferredencoding()
qs = qs.encode(encoding, 'surrogateescape'))
fp = BytesIO(qs)
----
If you would like to support byte *and* Unicode environment (eg.
environ=os.environ and environ=os.environb), you should do something a little
bit more complex: see os.get_exec_path(). I can work on a patch if you would
like to. A generic function should maybe be added to the os module, function
with an optional environ argument (as os.get_exec_path()).
---
if fp is None:
fp = sys.stdin
if fp is sys.stdin:
...
---
you should use sys.stdin.buffer if fp is None, and accept sys.stdin.buffer in
the second test. Something like:
---
stdin = sys.stdin
if isinstance(fp,TextIOBase):
stdin_buffer = stdin.buffer
else:
stdin_buffer = stdin
if fp is None:
fp = stdin_buffer
if fp is stdin or fp is stdin_buffer:
...
---
Don't you think that a warning would be appropriate if sys.stdin is passed
here?
---
# self.fp.read() must return bytes
if isinstance(fp,TextIOBase):
self.fp = fp.buffer
else:
self.fp = fp
---
Maybe a DeprecationWarning if we would like to drop support of TextIOWrapper
later :-)
For the else case: you should maybe add a strict test on the type, eg. check
for RawIOBase or BufferedIOBase subclass, isinstance(fp, (io.RawIOBase,
io.BufferedIOBase)). It would avoid to check that fp.read() returns a bytes
object (or get an ugly error later).
Set sys.stdin.buffer.encoding attribute is not a good idea. Why do you modify
fp, instead of using a separated attribute on FieldStorage (eg.
self.fp_encoding)?
---
# field keys and values (except for files) are returned as strings
# an encoding is required to decode the bytes read from self.fp
if hasattr(fp,'encoding'):
self.fp.encoding = fp.encoding
else:
self.fp.encoding = 'latin-1' # ?
---
I only read the constructor code.