Security

(public)

User Story

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
We're using a progress bar which relies on the XMLHttpRequest event 'progress'.
When loading gzip'd files, the "loaded" property of the event is at times greater than the "total" property. From what I gathered, the total is the number of bytes as advertised by the server (so the size of the gzip'd file), but the number of loaded bytes we get comes from the uncompressed data.
That's an assumption, but it seems the most likely problem.
Here's some example code:
var url = "path/to/gzipdFile.jz";
var xhr = new XMLHttpRequest();
xhr.onload = function(e) {
success(e.target.responseText);
};
xhr.addEventListener('progress', onProgress, false);
xhr.open('GET', url, true);
xhr.send(null);
function onProgress(e) {
if(e.lengthComputable) {
if(e.loaded > e.total) {
throw "How have we loaded more bytes ("+e.loaded+") than the total amount of bytes ("+e.total+") available?";
}
}
}
Reproducible: Sometimes
Steps to Reproduce:
1. Run the above example code with a path to an actual gzip'd file.
2. It may not generate enough progress events if the server is too fast or the file really small.
Actual Results:
It throws exceptions as the amount of loaded bytes is greater than the total amount of bytes supposedly available.
Expected Results:
The amount of loaded bytes should be upper bounded to the total amount of available bytes. The loaded property should return the number of bytes received from the server before decompression, or there should be some way to retrieve that.
I run my webapp from an embedded server, so its particularly slow; files above 10kb generate enough progress events for this to happen consistently.

> The loaded property should return the number of bytes received
> from the server before decompression
That information is not available from necko (nor necessarily sensible in the presence of chunked encodings, etc in any case).
On the other hand, in the presence of content encodings the total size of the decoded data is not really available until you're done loading either; all you have is the Content-Length, and not even that for chunked encodings.
Jonas, what does the spec call for here? Should we just always set total to unknown for content-encoded stuff and be done with it?

Would love to know which exact headers the server is sending here.
The spec says that 'total' should reflect the contents of the Content-Length header, or 0 if the Content-Length is not known.
Does http say that Content-Length represents the encoded or decoded length in the case of a zip-encoded transfer? If it's the encoded length then maybe setting 'total' to 0 is the right thing to do for zip-encoded transfers and the spec needs to be updated.
Does necko expose encoded number of bytes received?

> Does http say that Content-Length represents the encoded or decoded length
Encoded. Contnt-Length is bytes on the wire (insofar as there's a "wire"; if SSL is involved this is the number of post-decryption bytes, e.g.). The actual length of the data after decoding may be longer or shorter than that.
> Does necko expose encoded number of bytes received?
Not really, no. See comment 2.

I sent an inquiry to the list. Personally I think the most useful behavior would be to make .total reflect the contents of the Content-Length header, and .loaded the number compressed bytes received. Until we can dig that information out of necko it seems like setting .total to 0 is the best option.

That depends on what we want to do and what we think the spec should say. One completely sane behavior would be to report all values as decompressed values and set lengthComputable to false for gzipped requests.

There was some discussion about this at some point, iirc, which sort of led to the spec saying what it says right now (that all the values for progress events should be octet values from the network layer). Worth looking up to see whether there was real discussion or Anne just deciding something.

I ran into this on AWSY - we gzip the large amounts of JSON data, and FF returns a .loaded that is larger than .total.
We should probably set lengthComputable to false if we're not going to be providing a computable length.

This bug makes preloader progress indicators impossible to implement on FF (Chrome/IE just work fine).
".loaded" should indicate the number of bytes having been transferred over the network (% of Content-Length), since that's what matters in a large network request. Number of decompressed bytes does not relate to the _progress_ of the request, since 20% of total (compressed) network bytes can equal 90% of total uncompressed data bytes.

Is the spec clear now? I had a quick look, and I only find this text in the XHR spec:
Initialize the loaded attribute to the length of response's body.
If the length of response's body is known through the Content-Length header, set the lengthComputable attribute value to true and the total attribute value to the length.
and nothing I could find in the Fetch spec defines "body" differently if there is some compression involved (but I'm less familiar with Fetch so I may have missed it).

BTW, here's the question sicking posted to the list back then: http://lists.w3.org/Archives/Public/public-webapps/2010OctDec/0757.html - it caused some discussion but not as far as I can tell any conclusions.
FWIW, I agree with the reporter of this bug that "the loaded property should return the number of bytes received from the server before decompression". XHR can be used to load binary content (including gzipped binary content), so there's no point thinking of text encodings here. The main use case for progress events is to show progress, the main purpose of .loaded is to report how much of .total has come over the wire already. So yes, we need a Necko bug blocking this one on reporting this information somewhere.

I filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=25587 on Fetch / XMLHttpRequest.
Hallvord, I don't think anyone was talking about text encodings. A content encoding (such as gzip) is something handled transparently by HTTP and when used would not reach an API (the API would have the decoded content).
It is of course still possible for the HTTP layer to report the before-decoding-value to the API layer, which I think is what would be most useful for developers wishing to implement a progress bar.

(In reply to Anne (:annevk) from comment #14)
> I filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=25587 on Fetch /
> XMLHttpRequest.
Thanks Anne!
> Hallvord, I don't think anyone was talking about text encodings.
Sorry, I was thinking aloud based on stuff Björn Höhrmann wrote in that public-webapps thread. (And sicking saying it might be a problem if .loaded != .responseText.length). In this bug comment thread, those thoughts had no context indeed, so I was just adding noise.. :-]

necko provides nsIProgressEventSink and sends 'progress' and 'progressMax' both for comprassed data. These values are available in XHR.
progress uncompressed data, the number of bytes *after* decompression, are send in OnDataAvailable.
in nsXMLHttpRequest.cpp, function DispatchProgressEvent sends progress data for decompressed data and progressMax for compress data.

Hmm. Does nsIProgressEventSink provide numbers before or after undoing chunked transfer encoding? I seem to recall that it's before, so it's not the number we want. In any case, let's discuss this in bug 1007020.