FAQ: Days Are Missing from the Log Data

When I look at my statistics, I see that some days are missing. I know I had traffic on those days. Why
aren't they shown?

Short Answer

Your ISP may be regularly deleting or rotating your log data. Ask them to leave all your log data, or rotate it over a longer interval. It's also possible that your log data does not contain those days for another reason.

Long Answer

To save disk space, many ISPs delete, or "rotate" (rename and/or compress) the server log
data regularly. For instance, instead of letting the log file grow forever, they may
rename it every day, start a new one, and compress the old one; then, every week, they may
delete the logs older than seven days. In other, more dramatic cases, they may simply
delete the log file every month or week, and restart a new one.

Though this does save disk space on the server, it presents serious problems for
log analysis. When you rebuild the database with Sawmill, it processes all
the existing log data, and creates a new database from it. If some of the old log
data has been deleted, that data will no longer be available in the statistics.
So if the ISP deletes the logs every month, and you rebuild your database, your
statistics will go back one month at the most.

Similarly, when you update the database, Sawmill adds any new data in the
existing log data to the database. So if the ISP deletes log files every month,
and you only update your database every month on the 15th, then all the data from
the 15th to the end of each month will be missing, because it was not added through
an update, and it was deleted on the 1st of the month.

The best solution is to convince your ISP to keep all of your log data, and never
delete any of it. If you can do that, then there will be no problem-- you'll
always be able to rebuild or update your database and get all of the statistics.
Since this will require more of your ISPs disk space, however, they may not be
willing to do this, especially if you have a very large site, or they may
charge extra for the service. Of course, if you own and manage your own server,
you can do this yourself.

The second best solution, if you can't convince the ISP to keep all log data, is
to store your back log files on your own system. If your ISP rotates the data
through several logs before deleting the oldest one, this is easy-- just download
the logs you don't have regularly (you may be able to automate this using an FTP
client). If they only keep one copy, and delete it and restart it regularly,
then you'll need to download that file as close to the reset time as possible,
to get as much data as possible before it is deleted. This is not a reasonable
way for ISPs to rotate logs, and you should try to convince them to rotate
through several files before deleting the oldest one, but some of them do it
this way anyway. You'll never get all of your log data if they use
this technique-- the very last entries before deletion will always be lost--
but if you time it right you can get pretty close.

Once you have the logs on your system, you can analyze that at your leisure,
without worrying about them being deleted. In this situation, you'll
probably want to run Sawmill on the system where you keep the back logs.

If your log rotation is not the issue, then it may be that your log data does
not contain the data for another reason. Maybe the server was down for a period,
or the log data was lost in a disk outage, or it was corrupted. Look at the log
data yourself, using a text editor, to make sure that it really does contain the
days that you expected it to contain. If the data isn't in your logs,
Sawmill cannot report statistics on it.