Large Log Files

Hi, has anyone out there had much success with analog running on large log files. Currently I have been trying to analysis combined log files that are generally 3-4G each, but analog appears to ignore their contents totally, I can obtain a subset of the analysis by copying out a chunk of each file and running analog on that, but above a certain size it appears to give up totally on the logs. The system is a Red Hat Linux 2.4.3-12smp system with Apache/1.3.19 and Analog 5.03 and 0.5G RAM and 1G swap.

> Hi, > has anyone out there had much success with analog running on large log > files. Currently I have been trying to analysis combined log files that > are generally 3-4G each, but analog appears to ignore their contents > totally, I can obtain a subset of the analysis by copying out a chunk of > each file and running analog on that, but above a certain size it > appears to give up totally on the logs. The system is a Red Hat Linux > 2.4.3-12smp system with Apache/1.3.19 and Analog 5.03 and 0.5G RAM and > 1G swap. >

Did it give a warning message before giving up? Are you using the LOWMEM commands?

> has anyone out there had much success with analog running on > large log files. Currently I have been trying to analysis combined > log files that are generally 3-4G each, but analog appears to ignore > their contents totally, I can obtain a subset of the analysis by > copying out a chunk of each file and running analog on that, but > above a certain size it appears to give up totally on the logs. The > system is a Red Hat Linux 2.4.3-12smp system with Apache/1.3.19 and > Analog 5.03 and 0.5G RAM and 1G swap.

Are those logfiles actually on the Red Hat system? Doesn't Red Hat have a 2GB file limit?

We have 1Gb off memory and 2 PII 350 and are having a problem like this, even if we use lowmen 3 (host, ref) and 2 (vhost, user, browser, file) . I have a question about this (Run out of memory with 800x6 MB of log file). We?ve tested many configurations of lowmen, cache file, zipped/nonzipped input files etc, whithout success. Until we find a better solution, our boss will keep receiving the reports from webalizer software.

> Hi, > has anyone out there had much success with analog running on large log > files. Currently I have been trying to analysis combined log files that > are generally 3-4G each, but analog appears to ignore their contents > totally, I can obtain a subset of the analysis by copying out a chunk of > each file and running analog on that, but above a certain size it > appears to give up totally on the logs. The system is a Red Hat Linux > 2.4.3-12smp system with Apache/1.3.19 and Analog 5.03 and 0.5G RAM and > 1G swap. >

Did it give a warning message before giving up? Are you using the LOWMEM commands?

The file/filesystem size limits depends on the exact block size you are using. On that particular parititon I am using a 4K block size which gives me a maximum file size of 2TB and a filesystem limit of 16TB. (I dont think the 4G log files are in danger of breaking that limit anytime soon..)

--- Alex

Jeremy Wadsack wrote: > > Alexander Cohen (A.Cohen [at] latrobe): > > > has anyone out there had much success with analog running on > > large log files. Currently I have been trying to analysis combined > > log files that are generally 3-4G each, but analog appears to ignore > > their contents totally, I can obtain a subset of the analysis by > > copying out a chunk of each file and running analog on that, but > > above a certain size it appears to give up totally on the logs. The > > system is a Red Hat Linux 2.4.3-12smp system with Apache/1.3.19 and > > Analog 5.03 and 0.5G RAM and 1G swap. > > Are those logfiles actually on the Red Hat system? Doesn't Red Hat > have a 2GB file limit? > > -- > > Jeremy Wadsack > Wadsack-Allen Digital Group > > +------------------------------------------------------------------------ > | This is the analog-help mailing list. To unsubscribe from this > | mailing list, go to > | http://lists.isite.net/listgate/analog-help/unsubscribe.html> | > | List archives are available at > | http://www.mail-archive.com/analog-help [at] lists/> | http://lists.isite.net/listgate/analog-help/archives/> | http://www.tallylist.com/archives/index.cfm/mlist.7> +------------------------------------------------------------------------

Hi, I just tried running analog fromt he command line and got this line among its usual output:

analog: Warning F: Failed to open logfile /web/logs/access_log.2001-10: ignoring it

(thats one of the large files that DOES exist)

--- Alex

Stephen Turner wrote: > > On Wed, 31 Oct 2001, Alexander Cohen wrote: > > > Hi, > > has anyone out there had much success with analog running on large log > > files. Currently I have been trying to analysis combined log files that > > are generally 3-4G each, but analog appears to ignore their contents > > totally, I can obtain a subset of the analysis by copying out a chunk of > > each file and running analog on that, but above a certain size it > > appears to give up totally on the logs. The system is a Red Hat Linux > > 2.4.3-12smp system with Apache/1.3.19 and Analog 5.03 and 0.5G RAM and > > 1G swap. > > > > Did it give a warning message before giving up? Are you using the LOWMEM > commands? > > -- > Stephen Turner, Cambridge, UK http://homepage.ntlworld.com/adelie/stephen/> "This is Henman's 8th Wimbledon, and he's only lost 7 matches." BBC, 2/Jul/01 > > +------------------------------------------------------------------------ > | This is the analog-help mailing list. To unsubscribe from this > | mailing list, go to > | http://lists.isite.net/listgate/analog-help/unsubscribe.html> | > | List archives are available at > | http://www.mail-archive.com/analog-help [at] lists/> | http://lists.isite.net/listgate/analog-help/archives/> | http://www.tallylist.com/archives/index.cfm/mlist.7> +------------------------------------------------------------------------

Permissions are fine, and I can head/tail/cat the logfiles quite happily, I can run a wc on them (17 million lines) too. Its only analog that seems to have any issues with those files.

--- Alex

Jeremy Wadsack wrote: > > Alexander Cohen (A.Cohen [at] latrobe): > > > I just tried running analog from the command line and got this line > > among its usual output: > > > analog: Warning F: Failed to open logfile /web/logs/access_log.2001-10: > > ignoring it > > > (thats one of the large files that DOES exist) > > As the same user that Analog is running as, can you do this > > head /web/logs/access_log.2001-10 > > or this > > cat /web/logs/access_log.2001-10 > > (Not that you really want to do the last, but in case the first > succeeds.) > > Is there a permissions or other problem reading the file or is it only > Analog that balks at it? > > -- > > Jeremy Wadsack > Wadsack-Allen Digital Group > > +------------------------------------------------------------------------ > | This is the analog-help mailing list. To unsubscribe from this > | mailing list, go to > | http://lists.isite.net/listgate/analog-help/unsubscribe.html> | > | List archives are available at > | http://www.mail-archive.com/analog-help [at] lists/> | http://lists.isite.net/listgate/analog-help/archives/> | http://www.tallylist.com/archives/index.cfm/mlist.7> +------------------------------------------------------------------------

> The file/filesystem size limits depends on the exact block > size you are using. On that particular parititon I am using > a 4K block size which gives me a maximum file size of 2TB > and a filesystem limit of 16TB. (I dont think the 4G log > files are in danger of breaking that limit anytime soon..)

As I understand it, there's a little bit more to it than that. The 2.4 kernel has support for files > 2G, but your application also needs to be compiled with libraries that support files of that size. From what I've read, you'll need to have glibc 2.2 compiled against kernel 2.4, and then compile Analog against that version of glibc.

But I'm only telling you what I've read - I haven't done any of this myself.

Hi, I updated glibc, but it didn't help much. I can't update the kernel however as its a system I should keep easily maintainable and theres no newer SMP kernels available for it. However what I did find is that I can add the following flags for GCC to the Makefile:

-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE

Which forces the larger file handles to be used and it seems to be working now. Takes a while to generate a report of course due to the huge amount of data it has to parse but its going. Thanks for your help everyone.

--- Alex

Aengus wrote: > > From: "Alexander Cohen" <A.Cohen [at] latrobe> > > > The file/filesystem size limits depends on the exact block > > size you are using. On that particular parititon I am using > > a 4K block size which gives me a maximum file size of 2TB > > and a filesystem limit of 16TB. (I dont think the 4G log > > files are in danger of breaking that limit anytime soon..) > > As I understand it, there's a little bit more to it than that. The 2.4 > kernel has support for files > 2G, but your application also needs to be > compiled with libraries that support files of that size. From what I've > read, you'll need to have glibc 2.2 compiled against kernel 2.4, and then > compile Analog against that version of glibc. > > But I'm only telling you what I've read - I haven't done any of this > myself. > > Aengus > > +------------------------------------------------------------------------ > | This is the analog-help mailing list. To unsubscribe from this > | mailing list, go to > | http://lists.isite.net/listgate/analog-help/unsubscribe.html> | > | List archives are available at > | http://www.mail-archive.com/analog-help [at] lists/> | http://lists.isite.net/listgate/analog-help/archives/> | http://www.tallylist.com/archives/index.cfm/mlist.7> +------------------------------------------------------------------------

Yes, that is the type of log file its is. I was asking to see if Analog will process that type of file. What do you mean it won't work out of the box, and that its not a natural fit? Can they be converted?

Kush <aethiops007 [at] yahoo> wrote: > My site recieves a whopping 4GB's per day!!! What can I do?? My host > are staying they can't help me. Can you process transfer.log files?

Transfer.log files often refer to FTP server type logs, that often have multiple entries for each transaction. If that's the type of log file that you're taking about, then you may be able to use Ananlog to coax some information from the data in the log files, but it's not a natural fit, and won't work out of the box.

If that's not what you mean by trasnfer.logs, can you expand on your question?

Kush <aethiops007 [at] yahoo> wrote: > Yes, that is the type of log file its is. I was asking to see if > Analog will process that type of file. What do you mean it won't work > out of the box, and that its not a natural fit?

And Analog is designed to work with web server log files where every single line represents a complete transaction, and consists of a a date, time, IP address, request, status code and some other optional fields such as the bytes transferred, user name, referrer and Browser Agent string. If your logfiles look like that, then Analog will work very well with those logs.

I have no idea what exactly you mean by a "transfer.log". But most FTP server logs tend not to have all the information that Analog expects, or else have multiple log entries for a single "transaction" (for example an entry when a file transfer starts, and another one when it finishes). Analog isn't designed to parse that type of information, though if you know what you're doing, you can use Analog to extract some information from that type of log.

> Can they be > converted?

Beats me, as you haven't told us what you mean by "transfer.log".

Analog is designed to analyse and report on Web server access logs. If you're not dealing with standard web server log files, and don't have time to read the documentation, then Analog isn't for you.

Kush <aethiops007 [at] yahoo> wrote: > Yes, that is the type of log file its is. I was asking to see if > Analog will process that type of file. What do you mean it won't work > out of the box, and that its not a natural fit?

And Analog is designed to work with web server log files where every single line represents a complete transaction, and consists of a a date, time, IP address, request, status code and some other optional fields such as the bytes transferred, user name, referrer and Browser Agent string. If your logfiles look like that, then Analog will work very well with those logs.

I have no idea what exactly you mean by a "transfer.log". But most FTP server logs tend not to have all the information that Analog expects, or else have multiple log entries for a single "transaction" (for example an entry when a file transfer starts, and another one when it finishes). Analog isn't designed to parse that type of information, though if you know what you're doing, you can use Analog to extract some information from that type of log.

> Can they be > converted?

Beats me, as you haven't told us what you mean by "transfer.log".

Analog is designed to analyse and report on Web server access logs. If you're not dealing with standard web server log files, and don't have time to read the documentation, then Analog isn't for you.

Kush <aethiops007 [at] yahoo> wrote: > Yes, that is the type of log file its is. I was asking to see if > Analog will process that type of file. What do you mean it won't work > out of the box, and that its not a natural fit?

And Analog is designed to work with web server log files where every single line represents a complete transaction, and consists of a a date, time, IP address, request, status code and some other optional fields such as the bytes transferred, user name, referrer and Browser Agent string. If your logfiles look like that, then Analog will work very well with those logs.

I have no idea what exactly you mean by a "transfer.log". But most FTP server logs tend not to have all the information that Analog expects, or else have multiple log entries for a single "transaction" (for example an entry when a file transfer starts, and another one when it finishes). Analog isn't designed to parse that type of information, though if you know what you're doing, you can use Analog to extract some information from that type of log.

> Can they be > converted?

Beats me, as you haven't told us what you mean by "transfer.log".

Analog is designed to analyse and report on Web server access logs. If you're not dealing with standard web server log files, and don't have time to read the documentation, then Analog isn't for you.

On 7/16/2008 7:32 PM, Kush wrote: > > transfer.log is what I read when I log into to my server by way of my > FTP and look into the log folder. Thats all. Can Analog process past log > info?

That tells me nothing about what the data in the logfile looks like. Post the first 8 lines from one of your logfiles and someone on the list will tell you whether it's something that Analog will handle.

> I've read that Analog can process large files like no other program. > Will my 4GB per day log files be an issue??

Analog can handle 4G of logfiles. You might encounter problems if you try to analyis a couple of weeks worth of such logs in a single report. If you need to do something like that, Analog can "summarise" teh information in log files into a cache file that will allow it to generate reports on very large amounts of data.

On 7/16/2008 7:32 PM, Kush wrote: > > transfer.log is what I read when I log into to my server by way of my > FTP and look into the log folder. Thats all. Can Analog process past log > info?

That tells me nothing about what the data in the logfile looks like. Post the first 8 lines from one of your logfiles and someone on the list will tell you whether it's something that Analog will handle.

> I've read that Analog can process large files like no other program. > Will my 4GB per day log files be an issue??

Analog can handle 4G of logfiles. You might encounter problems if you try to analyis a couple of weeks worth of such logs in a single report. If you need to do something like that, Analog can "summarise" teh information in log files into a cache file that will allow it to generate reports on very large amounts of data.

On 7/16/2008 7:32 PM, Kush wrote: > > transfer.log is what I read when I log into to my server by way of my > FTP and look into the log folder. Thats all. Can Analog process past log > info?

That tells me nothing about what the data in the logfile looks like. Post the first 8 lines from one of your logfiles and someone on the list will tell you whether it's something that Analog will handle.

> I've read that Analog can process large files like no other program. > Will my 4GB per day log files be an issue??

Analog can handle 4G of logfiles. You might encounter problems if you try to analyis a couple of weeks worth of such logs in a single report. If you need to do something like that, Analog can "summarise" teh information in log files into a cache file that will allow it to generate reports on very large amounts of data.

On 7/16/2008 10:45 PM, Kush wrote: > > Can I remove the files from the server and process them from my computer > using Analog? Do you provide set-up services?

This is a "self-help" mailing list for users of Analog. Analog is a freeware application that you download and install yourself. There are no "set up services" - just read the documentation at http://www.analog.cx

> How many request equals an actual visit. I was told by my host that I > recieve 2,000 request per second but that doesn't mean visits. Are > there other report display options?

On 7/16/2008 10:45 PM, Kush wrote: > > Can I remove the files from the server and process them from my computer > using Analog? Do you provide set-up services?

This is a "self-help" mailing list for users of Analog. Analog is a freeware application that you download and install yourself. There are no "set up services" - just read the documentation at http://www.analog.cx

> How many request equals an actual visit. I was told by my host that I > recieve 2,000 request per second but that doesn't mean visits. Are > there other report display options?

Thanks Sean. I've reviewed the documentation but it really didn't clear things up thus, bringing more questions. Is this an industry issue? What is one to tell advertisers as how many "visits" you've recieved?

Kush wrote: > Thanks Sean. I've reviewed the documentation but it really didn't clear > things up thus, bringing more questions. Is this an industry issue? What > is one to tell advertisers as how many "visits" you've recieved?

Yes, the issue is just due to how the WWW works.

Analog is _superb_ at giving you access to the figures that _can_ be determined eg unique hosts, pageviews, requests (AKA 'hits') etc but it does not even attempt to tell you anything about the things that cannot be deduced from the server log files, eg visits, unique visitors, 'time spent on site' etc.

Certain other analysis packages <cough> Webtrends </cough> will _claim_ to be able to tell you this information but it is inevitably arrived at by placing assumptions on the data available and cannot be viewed as being reliable and additionally costs a packet for the privilege of obtaining this dubious info.

To determine figures like this additional steps (sessions, cookies etc) need to be taken to collect the necessary information as the server logs alone are unable to provide it.

There is no way to accurately determine visits or visitors. It doesn't help to take the additional steps (sessions, cookies etc), that just gives you different results, not better results.

This is a fundamental issue in how the web works. There is no solution, only approximations. There isn't even a standard for how to do the approximations. Various groups use different approaches to approximating the results.

All kinds of trouble comes from this. Everyone always wants to know the number of visits/visitors. Since we can't know either of those, some approximation is made and used as if it is true. However, every approach to making an approximation gives different results, so when someone challenges your approximation they will always be able to come up with some "authoritative" source that gives conflicting results, no matter where you got your numbers to begin with.

Jason

Sean at Imaginet wrote: > > Certain other analysis packages <cough> Webtrends </cough> will _claim_ > to be able to tell you this information but it is inevitably arrived at > by placing assumptions on the data available and cannot be viewed as > being reliable and additionally costs a packet for the privilege of > obtaining this dubious info. > > To determine figures like this additional steps (sessions, cookies etc) > need to be taken to collect the necessary information as the server logs > alone are unable to provide it.

There is no way to accurately determine visits or visitors. It doesn't help to take the additional steps (sessions, cookies etc), that just gives you different results, not better results.

This is a fundamental issue in how the web works. There is no solution, only approximations. There isn't even a standard for how to do the approximations. Various groups use different approaches to approximating the results.

All kinds of trouble comes from this. Everyone always wants to know the number of visits/visitors. Since we can't know either of those, some approximation is made and used as if it is true. However, every approach to making an approximation gives different results, so when someone challenges your approximation they will always be able to come up with some "authoritative" source that gives conflicting results, no matter where you got your numbers to begin with.

Jason

Sean at Imaginet wrote: > > Certain other analysis packages <cough> Webtrends </cough> will _claim_ > to be able to tell you this information but it is inevitably arrived at > by placing assumptions on the data available and cannot be viewed as > being reliable and additionally costs a packet for the privilege of > obtaining this dubious info. > > To determine figures like this additional steps (sessions, cookies etc) > need to be taken to collect the necessary information as the server logs > alone are unable to provide it.

The numbers in ( ) are usally for the last 7 days. Near the top of your report there is a line that'll tell you.

IP addresses can give you some idea - but some IP addresses are just firewalls or NAT boxes etc that are masking groups of computers from the internet. Some ISP's give each access its own IP so you may have 1 webpage (1 x html, 9 images) and '10' visits which is just a single person fetching a page.

Unfortunatly a lot of people want 'exact' and can't/won't understand that even when presented with it as 'x' piece of software does it. Remember the quote - Lies, Damn Lies and Statistics. Analog reports what it can - it doesn't attempt to 'guess'. Cookies, sessions help but, as Jason says - you can't change how the web works. It's a fundamental but it also works very well.

There is no way to accurately determine visits or visitors. It doesn't help to take the additional steps (sessions, cookies etc), that just gives you different results, not better results.

This is a fundamental issue in how the web works. There is no solution, only approximations. There isn't even a standard for how to do the approximations. Various groups use different approaches to approximating the results.

All kinds of trouble comes from this. Everyone always wants to know the number of visits/visitors. Since we can't know either of those, some approximation is made and used as if it is true. However, every approach to making an approximation gives different results, so when someone challenges your approximation they will always be able to come up with some "authoritative" source that gives conflicting results, no matter where you got your numbers to begin with.

Jason

Sean at Imaginet wrote: > > Certain other analysis packages <cough> Webtrends </cough> will _claim_ > to be able to tell you this information but it is inevitably arrived at > by placing assumptions on the data available and cannot be viewed as > being reliable and additionally costs a packet for the privilege of > obtaining this dubious info. > > To determine figures like this additional steps (sessions, cookies etc) > need to be taken to collect the necessary information as the server logs > alone are unable to provide it.