a lot of times the code gets stuck because it takes time to extract the excerpt as some files are over 10MB. I want to use a timeout function to move to the next element if it takes more than two minutes. I have looked up Alarm but I don't know how to incorporate it into my code. It would be great if you could help me use a timeout in my function in case it gets stuck.

I think kcott has a fine post on how to use alarm(). However, it would seem to me that a better solution would be to figure out why this thing is so darn slow and fix that so that you get a result all of the time without having to "give up".

I looked at the first couple of regexes (see below). When you are using the /x modifier, you can space this out on multiple lines and this can improve the readability a lot. You can also add comments to the lines, but there are some limitations about what can go in the #comment (see perlre doc) for more details and you cannot put a space inside of a 2 char token like the ?: in (?: ..the non-capture..), but this #comment stuff can be useful.

I see some strange things (there appear to be terms that have no purpose). Also $data (maybe a 10MB) is slurped into memory as a single variable and many regex'es are applied serially to this humongous thing. Parsing, re-parsing, re-parsing and re-parsing something big is often not a good idea performance wise.

Often, parsing something very large is best done line by line and ONLY once. Read a line, deal with it, throw it away because we are done....

I suspect that if you shared some more details about the file format and why one of these things is 10MB?, far more efficient algorithms could be devised. Your regex'es appear to do very similar things. A single pass that figures everything out on "one go" would be faster. Could even be that algorithms that just stop reading the file, once we've got what we need are appropriate?

While I was playing with this, I spaced your regex'es
out (that is what the /x allows). Also show how to use
the Regex::Explain function - which is sometimes useful.

So some alternate way to space out the regex'es to increase readability are shown below.

I do suspect that the "real solution" is to make this so fast that there is never any need for a 2 minute timeout! But there are some things about your application that I and others just don't understand. It would be most helpful if you could clarify further!

its gonna search the whole 10MB to figure out that this match does not exist. I suspect that there is a far faster way to do this job? Maybe that's not possible, but I doubt that. I think you should be asking the Monks how to make your algorithm run so darn fast that this 2 minute time out is irrelevant. I would not be surprised if the total time to get all results is 5-10x faster but without knowing more I certainly can't guarantee that but if I was in Vegas, I would put some money down on that proposition. But you have to explain more - not enough information is known.

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other