Description

I have a view that is 825157 bytes without gzipping, 35751 bytes gzipped as an HttpResponse, but 1010920 bytes gzipped as a StreamingHttpResponse. The output of the script given below with some noddy data is:

Noddy content perhaps, but in actual use I'm very much wanting to use StreamingHttpResponse on very large JSON responses (then it uses 200Mb memory with iterables throughout, as opposed to 2Gb with more standard code/HttpResponse), and the Python json package flushes after each key, value, and punctuation in-between. Having the gzip middleware flush similarly creates a much larger output than no gzipping, with the figures given at the top. It would seem that many uses of StreamingHttpResponse will similarly be flushing regularly at the content level. #7581 does mention "some penalty in compression performance" but producing a worse-than-none performance seems a bit much :)

Should compress_sequence bunch up flushes to provide at least some level of compression? Or if it's a StreamingHttpResponse, should it not bother gzipping?

Change History (6)

Removing the flush(), the output does appear to 'bunch' itself into groups of about 17k, with the output ending up the same size as if it had been gzipped as a string. I have made this change and added a test of some JSON output at ​https://github.com/django/django/pull/4010 , hope that's of interest.

The function no longer flushes zfile after each write as doing so can
lead to the gzipped streamed content being larger than the original
content; each flush adds a 5/6 byte type 0 block. Removing this means
buf.read() may return nothing, so only yield if that has some data.
Testing shows without the flush() the buffer is being flushed every 17k
or so and compresses the same as if it had been done as a whole string.

The function no longer flushes zfile after each write as doing so can
lead to the gzipped streamed content being larger than the original
content; each flush adds a 5/6 byte type 0 block. Removing this means
buf.read() may return nothing, so only yield if that has some data.
Testing shows without the flush() the buffer is being flushed every 17k
or so and compresses the same as if it had been done as a whole string.