Comment on attachment 45914[details]
the patch
I think it woudl be better to write two new functions. One which does this loop and calls didRecievData on a passed in subresourceloader, and the other which does the loop and calls decode on a passed in decoder and returns the final string. That saves us from any copy/paste code.

Comment on attachment 45914[details]
the patch
> - String sheetText = m_decoder->decode(m_data->data(), m_data->size());
> + String sheetText;
> + const char* segment;
> + unsigned pos = 0;
Could you avoid the abbreviation "pos" here? I like "offset" for this, myself.
> + while (unsigned length = m_data->getSomeData(segment, pos)) {
It's unfortunate this operation is O(n log n) in the number of segments, having to find the data segment each time based on the passed-in offset, walking the list of segments. I wish the API of SharedBuffer didn't force this, although it's probably not a real concern.
Rather than repeating this decoding idiom over and over again, could you put a helper function somewhere that encapsulates the process of decoding a SharedBuffer into a String.
Appending to a String is a slow operation that requires reallocating the buffer each time. Instead we should append to a Vector<UChar> and convert to a String only at the end. Making a single function for this will help us make sure it works in a way that's optimal.
Another good optimization would be estimating the decoded size based on the size of the SharedBuffer to keep the number of reallocations to a minimum. This should be straightforward. We could start by assuming that each byte encodes an average of about one character.
> - loader->didReceiveData(data->data(), data->size(), data->size(), true);
> + const char* segment;
> + unsigned pos = 0;
> + int received = 0;
> + while (unsigned length = data->getSomeData(segment, pos)) {
> + pos += length;
> + loader->didReceiveData(segment, length, pos, false);
> + }
After this patch, is there any code anywhere passing true for the last argument to didReceiveData? If not, then we should remove that argument and the dead code.

Comment on attachment 45965[details]
Make TextResourceDecoder read from segmented SharedBuffer
> +String TextResourceDecoder::decode(const SharedBuffer& data)
> +{
> + String result;
> + const char* segment;
> + unsigned offset = 0;
> + while (unsigned length = data.getSomeData(segment, offset)) {
> + result += decode(segment, length);
> + offset += length;
> + }
> + return result;
> +}
It's great factoring to make use a function like this instead of spreading the code around.
I also note that all the clients who use this function also do the flushing, so in the future might want to make this helper function include the flush and have some name like "decodeAll" rather than just being cleanly parallel to the virtual decode function.
This is still a quite-inefficient algorithm for creating a string out of multiple decoded chunks; any loop that appends to a String is suboptimal because making a new String by appending two strings *always* involves allocating a fresh buffer each time and copying all the data; this inefficiency might make some real world cases pathologically worse if we find ourselves with multiple segments. It might actually be more efficient to do the dumb thing the old code was doing, and concatenate them all, just because the decoded data would be dealt with all at once even though the encoded data ends up being copied.
I think at some point we will want to change this. Even now I think we could do better with Vector<UChar>.
Maybe we want to get rid of the use of String in the interface to decoders eventually?
I am conflicted about the patch. It seems like a good idea in theory, but in practice it seems that this patch claiming to speed things up might always end up slowing things down instead. And might cause us to use more memory too.
> + String decode(const SharedBuffer& data);
I don't think the argument name is needed for clarity here so normally we would leave it out.
I am going to say r=me but I do have serious reservations. I'd like to hear more concrete evidence about why this is a helpful change to make.

(In reply to comment #12)
> (From update of attachment 45966[details])
> > + didReceiveData(segment, static_cast<int>(length), offset, false);
>
> Why is this cast needed? I think it's not helpful with any compiler.
>
> r=me
MVSC can give warning for unsigned->int conversion. But seems they have turned this warning off by default. don't know if any other compiler will complain this.

(In reply to comment #11)
>
> This is still a quite-inefficient algorithm for creating a string out of
> multiple decoded chunks; any loop that appends to a String is suboptimal
> because making a new String by appending two strings *always* involves
> allocating a fresh buffer each time and copying all the data; this inefficiency
> might make some real world cases pathologically worse if we find ourselves with
> multiple segments. It might actually be more efficient to do the dumb thing the
> old code was doing, and concatenate them all, just because the decoded data
> would be dealt with all at once even though the encoded data ends up being
> copied.
>
> I think at some point we will want to change this. Even now I think we could do
> better with Vector<UChar>.
>
> Maybe we want to get rid of the use of String in the interface to decoders
> eventually?
>
> I am conflicted about the patch. It seems like a good idea in theory, but in
> practice it seems that this patch claiming to speed things up might always end
> up slowing things down instead. And might cause us to use more memory too.
>
> > + String decode(const SharedBuffer& data);
>
> I don't think the argument name is needed for clarity here so normally we would
> leave it out.
>
> I am going to say r=me but I do have serious reservations. I'd like to hear
> more concrete evidence about why this is a helpful change to make.
You're right, this patch is probably not helpful at all. I'll leave it here. I don't like the way that webkit uses String to store and parse scripts. SegmentedString is a good concept, but only html parser is using it.

My second patch (for ResourceLoader) is buggy for sure. The reason is didReceiveData() can cancel the request. We could fix it by checking if the job is cancelled in each loop. But I think it is probably better to let SharedBuffer merge all segments into one in this case. For the similar reason, the first patch is also not needed.