Retrieving Log Files for Remote Hosted Web Sites

The previous blog entry entitled ‘Remote Hosted Sites and ISP Policies‘ within this series [Best Practices for Log File Management] discussed the challenges with respect to ISP policies and how they can impact your ability to get good data. The simple answer is to ensure that the ISP provides you access to your log files and that you warehouse them within part of your IT processes. As log files can provide you with an abundance of information, and we discussed earlier how they can be a source of ‘intellectual property’, a simple script can save you a lot of problems with respect to data.

In order to get your logs on a regular basis, you’ll need 3 things:

An FTP address (i.e. ftp.yourWeb site.com)

A username for the FTP account

A password for the FTP account

Once you’ve got this information, just a few lines in a DOS-based batch file and a scheduler on a server, you can download the log from ‘yesterday’ on a nightly basis. In short form, the following roughly represents what would be included in a simple batch file to get a log file from today’s date minus 1 day.

This simple batch file can be ran automatically by task schedulers or cron jobs on a nightly basis.

Now, depending on how your ISP inventories these log files, matters can become complicated in the sense that the date is often within the file name. In order to dynamically access the log from “today’s date minus 1 day”, additional scripting is required. While I consider this to require slightly more advanced knowledge of scripting and Dos, there is a program called ‘doff’ that I found online which enables you to calculate this variation in DOS; a good IT person can manage this aspect of the process for you.

In order to accomplish this, I find the simplest solution is to create a batch file which outputs a secondary batch file. The primary batch file can be automatically run at 1:00am for example and the secondary file (which is actually the output from the primary batch file) can be run at 1:05am. The secondary batch file has a dynamically modified reference in the ‘get’ command for the name of the log file.

Sample code for the primary batch file might look something like the following in order to generate the batch file as outlined above.

Finally, as there are many components and factors involved in this simple task, I recommend that the nightly process downloads that last 3 days, 5 days, or even 10 days of logs depending on their size. This is a failsafe way to avoid lost data due to internet failure, server failure, or numerous other factors.

In short, it is critical to download and warehouse your logs on a regular basis and it can very easily be automated so it does not become a burden among your list of many things to do each day or week.