Description:

On my project, I have to parse many corporate reports (>50 of them) and put the extracted data into a database.

What is the easiest way to parse reports? I'm hoping someone has already come up with a generic report parsing engine, where you specify the layout somehow along with how to put data in a database.

Has anyone solved the general problem already? Is there such an animal or are we inventing something entirely new? People have created computer reports since 1965, you'd think that someone would have invented a parsing engine already. Any Hints? Any ideas? I'm open to anything that will cut down our workload.

What you're looking for is a parser generator. You specify a template (called a "grammar") to the parser generator and it spits out Perl code to parse texts conforming to that template. Two well-known parser generators in the Perl world are Parse::Yapp and Parse::RecDescent.

My solution: define each type of report line as an unpack TEMPLATE.
(You could use my little piece of code to help you with this -- Fixed length file layout - cut2fmt 2).
Once you've got your templates, use regex's to identify the line type, then unpack to get the fields.