Extensive testing of libwww's handling of multipart/byteranges content from
HTTP/1.1 servers revealed multiple logical flaws and bugs in Library/src/HTBound.c
This module parses multipart/byteranges MIME content for its component items,
and is responsible for establishing and tearing down the internal libwww streams
that handle the MIME components.
I'm fairly certain that one of the bugs could be exploitable by a hostile HTTP
server, to cause an illegal memory access, and segfault, in
HTBoundary_put_block(). All libwww clients are vulnerable, including the LWP
Perl module.
Normally, multipart/byteranges MIME content is sent in response to a partial
content request from an HTTP/1.1 client. Apache sends multipart/byteranges MIME
content only in response to a request for two or more partial ranges. Apache
does not send multipart/byterange MIME content if only one partial range is
requested, which is probably why nobody noticed this for so long. They probably
didn't realize that token usage of libwww's HTRequest_addRange() to specify a
single range will _not_ cause a multipart/byterange response from Apache. Two
ranges must be added to force a multipart/byterange MIME response, and observe
all the following problems with HTBound.c
Note that libwww will attempt to process multipart/byteranges content even if it
did not request partial ranges from the server.
HTBoundary_put_block() receives raw multipart/byteranges content, in piecemeal
fashion. Each invocation passes a chunk of data in 'b', with its byte count
given in 'l'. This function is supposed to maintain internal state, look for
known MIME boundary delimiters in the stream and handle them accordingly.
The most critical fault in the code can be observed when the input buffer passed
to HTBoundary_put_block() terminates in the middle of a potential MIME boundary
delimiter. The following while() loop on line 52 will terminate due to input
being exhausted, with 'l' left at 0:
while (l>0 && *me->bpos && *me->bpos==*b) l--, me->bpos++, b++;
'l' will be left at 0, with 'b' pointing one byte past the end of the input
buffer passed to this function. Then, the if() clause spanning lines 64-69 will
make multiple attempt to access a byte past the end of the input buffer.
A band-aid solution would be to wrap that entire if() statement inside "if (l>0)
{ ... }". This is going to eliminate the one-off exploit. It's not going to
fix anything, because the entire logic in this function is utterly broken in
multiple ways, as I've sadly discovered.
Just to give one example: at line 91 we've finally parsed a delimiter boundary,
so the code destroys the protocol stack that received data from the previous
MIME entity:
if (me->target) FREE_TARGET;
Then, it creates a new stack for the new MIME entity:
me->target = HTStreamStack(WWW_MIME,me->format,
HTMerge(me->orig_target, 2),
me->request, YES);
Then, belatedly it checks if there was any data it buffered up while scanning
its input looking for the boundary delimiter, and, if so, pushes the data down
the protocol stack:
if (end > start) {
if ((status = PUTBLOCK(start, end-start)) != HT_OK)
return status;
}
But, guess what? This data was from the previous MIME entity and it should've
been sent down the old protocol stack. But it's not, and it's going to go down
the new stack. This if() statement needs to be moved up before FREE_TARGET.
Furthermore, if the partial content returned by the web server contains the byte
sequence "<LF>-<CR><LF>--DELIMITER" (the partial content ends with "<LF>-",
which is then followed by the MIME boundary delimiter marking the end of the
MIME entity, and the beginning of the next one) -- this is going to break
HTBoundary_put_block(). It will completely miss this occurence of the
delimiter. Explaining the ugly logic that's responsible for this will just take
too much time. This entire unmaintainable mess of a function needs to be
scrapped and replaced by clean code.
I'm trying to contact someone who might still have access to libwww's CVS
repository (W3C appears to have stopped maintaining this software three years
ago), and try to lobby him to accept the replacement code I've developed and
tested that fixes at least five major bugs in this single function.
Until then, I suggest that the exploitable bug at least be fixed as an errata,
by wrapping lines 64-69 inside an "if (l>0) { ... }". That's only going to
prevent the out-of-bounds memory access. This entire code is broken, and it
won't be able to reliably handle multipart/byteranges MIME content. But at
least it won't be exploitable.

Since this bug was reported, I also identified multiple other defects in
libwww's original HTBound.c. It's fundamental logic is inherently broken. I've
dumped HTBound.c, rewritten it from scratch, and now I'm maintaining my own
source tree, for my own purposes.
I could not make contact with anyone who claims to be maintaining libwww @ W3C,
to contribute my revised module. It does not appear to be actively maintained
any more. libwww has been dropped from Fedora, which is probably for the best.

> All libwww clients are vulnerable, including the LWP
> Perl module.
I'm the author of LWP and I don't belive the statement quoted above
to be true. LWP does not rely on the w3c-libwww code. Its only
parsing of multipart messages is in the _parts() function of
HTTP::Message and that method is pure perl code with not buffer
overflow issues.