I need to parse the original tomcat error log and generate a nice report for repeated error messages.

Originally, I use "grep -A1 ERROR <tomcat.error.log>" to grep the line contain keyword ERROR and the following line.
But some tomcat error log file have two continuous lines contain keyword "ERROR" such as
---------------
2008-08-08 16:32:22,649 [ERROR] http-8443-Processor24 -- Not flushing: pub has not been authenticated
2008-08-08 18:36:25,847 [ERROR] http-8443-Processor24 -- InputParser error: ServiceException:<status><code>5</code><msg>Servers are busy. Please resend the content</msg></status>
2008-08-08 18:36:25,847 [ERROR] http-8443-Processor24 -- [REQ_264]ServiceException:<status><code>5</code><msg>Servers are busy. Please resend the content</msg></status>
at com.opinmind.inputservice.InputParser.endElement(InputParser.java:283)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:633)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1241)
-------------------
So I need this script can do
1. Parse the original tomcat.error.log and grep the line with keyword "ERROR" and its following line if the following line doesn't not contain the keyword "ERROR".
2. Those error messages with keyword "ERROR" is in the format like "<sentence1> -- <sentence2>". <sentence2> is the main error message I need and it could be repeated in the log file. I need to parse this log to the following format:
=============================
First happen : 2008-08-08 16:32:22
Last Error message:
2008-08-08 19:32:22,649 [ERROR] http-8443-Processor24 -- Not flushing: pub has not been authenticated
Count (20)
-----------------------------
First happen: 2008-08-08 16:33:22
Last Error message:
2008-08-08 22:36:25,847 [ERROR] http-8443-Processor24 -- [REQ_264]ServiceException:<status><code>5</code><msg>Servers are busy. Please resend the content</msg></status>
at com.opinmind.inputservice.InputParser.endElement(InputParser.java:283)
Count (21)
-----------------------------
...
========================

Basically, this script can report the following information for each unique error message
1. First happen time in the log
2. The last happen time in the log with the line contain ERROR and the following line if the following line doesn't contain keyword "ERROR"
3. The count of the same type of error message
4. each type of error message is separated by "--------------------" or "==================" for easy reading.

Instead of using grep I would pipe(or actually open the file in perl) the whole logfile to perl and process it line by line.
Algorithm would be simple:

if grep (/ERROR/, $line){
push(@errors, $line);
}
else if (grep (/DEBUG/,$line ) or grep (/DEBUG/,$line) or ) {
next; #previous error had only one line or there was no error in previous line.
}
else{
push(@erros, $line); #append rest of previous error
}

For 1, there is no need to read the entire file into memory - the file can be processed one line at a time (if I'm understanding the requirements correctly), which is much more efficient, especially for large files. Here's a version that works that way. Call it like:
script.pl log_file

Thanks Adam314. I found the $last{$_} display order is not right.
-------------------
First happen: 2008-09-10 18:09:26
Last Error message:
2008-09-10 18:09:26,928 [ERROR] http-8080-Processor19 at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1269)
-- [REQ_13669]org.xml.sax.SAXParseException: Premature end of file.
Count (1)
-------------------

It should looks like
--------------------
First happen: 2008-09-10 18:09:26
Last Error message:
2008-09-10 18:09:26,928 [ERROR] http-8080-Processor19 -- [REQ_13669]org.xml.sax.SAXParseException: Premature end of file.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1269)

I test on the other tomcat log file and and it seems not showing the same error once for repeated error.

I have quick question about your second script.
1. How to I modify your part 2 script to show two following lines after the found the line with keyword ERROR or FATAL
2. How to I sort the report by Count number? So biggest count number error messages will display on top.

The attached file is another tomcat error log file which the repeated error message on $3 portion are count separately.

1. If the line contains ERROR, it is assumed to be the start of a new error. The previous error line (which may be several lines) is processed, and this line is saved as the error line. If the line does not contain ERROR, this line is assumed to be a continuation of the previous error line.

>>I test on the other tomcat log file and and it seems not showing the same error once for repeated error.
I'm not sure what you mean by this. Is the output not correct? If not, provide details about what it should be.

> 1. How to I modify your part 2 script to show two following lines after the found the line with keyword ERROR or FATAL
I think the question should be how to modify both script1 and script2 (majorly scrip1) to print out 2 lines after keyword ERROR or FATAL.

The first log you posted does, but for all of the FATAL messages, there is a more recent ERROR message, so only the ERROR is posted.
eg:
line 71 has FATAL with this error: org.xml.sax.SAXParseException: Premature end of file.
line 10412 has ERROR with the same error message
The First Happen comes from the FATAL on line 71: 2008-08-08 16:32:22
The Last Error message comes from the ERROR on line 10412:
2008-09-10 18:09:26,928 [ERROR] http-8080-Processor19 -- org.xml.sax.SAXParseException: Premature end of file.

Featured Post

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Setting up Secure Ubuntu server on VMware
1. Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions.
2. Power on th…

The viewer will learn the basics of jQuery, including how to invoke it on a web page.
Reference your jQuery libraries: (CODE)
Include your new external js/jQuery file: (CODE)
Write your first lines of code to setup your site for jQuery.: (CODE)