I convert data for a living and have not dealt with browsers or the internet directly with Perl. However, recently a client asked us to directly download their data from their secure website. This was something new (and exciting!) that I had not done so I went off, did some research, and wrote a program that has worked fairly well. Recently I ran the program to download the data and received a certificate error, which I had never seen before. OK, so, researched that, added code, now it by-passes that. However, my other challenge I have not been able to resolve is this... the program doesn't download the entire page of data any longer. It gets maybe 90% - 95% of the page and then stops and moves on to the next page of data. The only difference I can think of is that I upgraded from Activestate 5.10 to 5.16 but, I wouldn't think that would make a difference but it might. If I use the URL directly in my browser (any page of data) the entire page of data downloads just fine so ... I'm not sure what you guys might need to help out but, I need to be conscience of proprietary information.

Here is the major piece of code doing the work, with names changed to protect the innocent. :)

Because there is more than one page of data and I do not know the last page of data I use a temp.xml file to download the data then check the file to see if it has data, if it does I copy it to another location then delete temp.xml and basically grab the next page of data and loop that until no more page data is available.

Here is one of the header responses. The only difference is the Content-Length of each file. I've checked all files downloaded and all of them do not have the complete content of the XML data downloaded. The last 5% or so is cut off as I stated above.

Get/set the size limit for response content. The default is undef, which means that there is no limit. If the returned response content is only partial, because the size limit was exceeded, then a "Client-Aborted" header will be added to the response. The content might end up longer than max_size as we abort once appending a chunk of data makes the length exceed the limit. The "Content-Length" header, if present, will indicate the length of the full content and will normally not be the same as length($res->content).

Since I do not know the content_length it seems setting an arbitrary number may or may not get my results as my examples are only a dozen of about 50 files I will be downloading with varying content and sizes. Is it possible, since I'm doing two "get" calls, one for a temp file to make sure I've downloaded the last file and then one to actually download the file to the appropriate location and file name that after I do the temp call I retrieve the content_length, then apply that...

I don't think you should do that. I have not done any testing, but I suspect that you may need extra room for header overhead. If you set max_size to undef (which is supposed to be the default), then it should accept any size of download (short of any browser or server timeout).

Because I do not know when I've downloaded the last page of data. Unfortunately the client as XX amount of pages to download. So, I download it, check for content, if there is content, download it again to the appropriate place. I could just "copy" the downloaded temp file, definitely different ways to do it but regardless, the temp.xml file, the first call, doesn't contain all the data either so both are problematic.

OK, doing it your way it never stops, it never gives me a failure so it just continues and continues and continues. This is why I changed the code so I could read the content and verify whether the file had data. I have 12 pages that actually contain data. I had to kill the program at page 21. When called their server serves up an XML file regardless of whether it contains actual data. So when I make the call to download a page of data, it always gives an XML page with header information but blank below that if it doesn't have data. This gives a legitimate XML file which has no data. So, I created my routine to read the file after downloading and if it doesn't contain actual data (I search for a certain data element) then I'm done and move on to the next batch. I had forgotten why I did that so please, moving forward, don't assume I'm writing bad code, let's tackle my problem of not getting all the data within a downloaded XML file, so I'm not wasting my companies time re-writing code I really don't need to re-write. Mucho appreciated :) The code below is what I changed per your request. Output was the same as it always has been, partially downloaded XML files.

Now, having said all that I have included the above logic where it makes sense within the confines of the logic I need to achieve my goals. Here is the snippet of code after removing the above and adding the success check so that the code is "more correct" and in line with what you'd like to see.

16k is considerably less than the expected size as well as considerably less than what you previously stated which was 90 to 95% of the expected size. So, was your original estimate completely wrong or are there other details that you've left out?

If all of the files are the exact same 16k size, then that leads to the next obvious question. Do they all have the same contents?

Quote

moving forward, don't assume I'm writing bad code

I never said that you were writing bad code, however, you do have lots of questionable code. For example, this statement.

Code

if ($more = &check_xml)

1) It's already been pointed out that you shouldn't use & when executing the sub.

2) The conditional is not comparing the 2 values to see if they're the same. It's assigning the return value of the sub to $more and then evaluating that var in boolean context. Since you previously assigned $more the string 'yes', I can only assume that the sub returns either 'yes' or 'no'. In boolean context strings will evaluate as true, which is probably not what you intended.

There are a least a dozen other examples in your code, some of them we've already pointed out. We point these out so that you can correct them which will make your code more readable, maintainable, easier to troubleshoot and have fewer bugs.