Jump to:

If there isn't already, it would be great to supply an array of allowed content types, or a way to limit to only text/html. So that if the url is a pdf, or zip file etc, then it doesn't try to download it.

Also, to specify content length too would be good, so it doesn't try to download a 500mb file for example.

In linkchecker I've added 'Range' support some weeks ago. I'm using HEAD mostly (range not required), but users are able to use GET and than I force downloading the first 0-1024bytes only... This works great. This is just a hint how you may solve it for every content type and server that supports ranges (should be normal today).

Just keep in mind you will not get "200 OK", it't "206 Partial Content".

I have not understood the details behind you chunks logic. When i implemented it I was not aware of httprl and made it for core and to prevent the 500 or 5GB downloads with GET mehod :-). But without range limit httprl must download all, what is all correct.

In terms of limiting the number of bytes downloaded; that can be done. Now that I think about the requirements for a link checker, this would be a nice feature. I can limit the total bytes transfered, just not the message size due to chunked transfer encoding.

Nothing required by you... 'Range' header is the way all linkcheck modules should go if they have a need to limit transfered bytes. It's the standard way how webservers work. Why should we add any other stuff? :-)

<?php// Range: Only request the first 1024 bytes from remote server. This is // required to prevent timeouts on URLs that are large downloads.if ($link->method == 'GET') { $headers['Range'] = 'bytes=0-1024'; }?>

I will be creating a patch for servers that do not accept the Range header. This is useful as anything downloaded in httprl gets loaded into memory. Requesting a URL that returns a lot of data could cause PHP to run out of memory. In this case, if the Range header is sent out and server does not reply with a 206 but a 200, httprl will download up to the last byte of data needed in order to fulfill the range request and then close the connection; turning the 200 back into a 206.