Need help to fine tune perl script to make it faster( currently taking more than 30 minutes)

Can't Post

Hi,

Need your expert advice on fine tuning the below perl script, which is currently taking more than 30 minutes to extract information from a raw file which has nearly 2million records. Thanks in advance for your help!! Below are the requirements met in the script: 1)Script should scan through a log file and output the final result into a file called "test.txt" 2)It replaces some strings in the log file into more generic terms. 3)It finds the record number of the affected row from the log file and uses this record number to query the main source file for the particular record. 4) It generates a string which now contains information from the logfile and patches it along with the affected row from the source 5) the result is written into the output file test .txt

Re: [anujajoseph] Need help to fine tune perl script to make it faster( currently taking more than 30 minutes)
[In reply to]

Can't Post

Hi,

there are many things to say about your code not respecting the good practices that are currently generally accepted. The main thing being that you should always "use strict;" and "use warnings;" pragmas in any Perl script that is longer that 2 or 3 lines. This will force you to declare your variable in appropriate scope, but it will help you find bugs or possible misconstructs. Havins said that, none of these good practices is likely to speed up your program in any significant way.

If you are running a (very) old version of Perl, it could be that replacing:

Code

{ $_ =~ s/$str1/$str2/g;

$_ =~ s/$str3/$str4/g;

by

Code

{ $_ =~ s/$str1/$str2/go;

$_ =~ s/$str3/$str4/go;

will noticeably improve performance because, on old version of Perl (before Perl 5.6), the interpreter would have to recompile the regex for each line of input, whereas the /o option tells the interpreter to always reuse the compiled form of the regex.

If you are running a recent version of Perl this will most probably not make any difference. BTW, if you are running a recent version of Perl, you should have a look at the qr// solution for declaring regular expression.

There is something that might possibly improve the two lines above. Since you are not really using pattern matching, it is a bit of an overkill to use the regex engine for achiving something that can be done with simpler sttring recognition.

So you may try to replace your two expressions quoted above by a combination of the index (to find the place) and the substr (to perform the string substitution) functions. This may or may not be faster, only benchmarking the two alternatives will tell you for sure. Actually, if you do it, I would be interested in knowing the results.

These are just some possible clues, but don't expect a huge improvement from these and read on, I am getting to the most important point.

The key to improving performance is to profile your code, to figure out where the program is spending a lot of time. Often, 20% of the code account for 80% of the time spent. If you improve sections of the code not belong ging to those 20% that are slow, you will not get any significant improvement (and you are probably wasting your time). In addition, it is generally admitted that programmers are often bad at guessing where the slow part is. Therefore using a profiling tool is really the most important thing you can do, as a starting point, for improving your program performance (of course it is very important to do this code profiling on actual data, otherwise your results may be unsignificant).

Most profilers, however, have a function-level granurality. Since you don't have any function, they would be useless in your case. You should probably try the CPAN's Devel::NYTProf module, which works at the line of code level.

For more details on improving program performance, take a look at the following document: http://search.cpan.org/~dom/perl-5.12.5/pod/perlperf.pod.

Re: [Laurent_R] Need help to fine tune perl script to make it faster( currently taking more than 30 minutes)
[In reply to]

Can't Post

thanks for all your valuable suggestions. i had found its the "Sed" command im using that is taking the most time. after removing it and re-processing the logic using an array , its now much faster. but i will still try out your suggestion to amend the code and update you on the same. thanks again!

Re: [anujajoseph] Need help to fine tune perl script to make it faster( currently taking more than 30 minutes)
[In reply to]

Can't Post

Interesting, I was wondering why you used this sed command and I certainly would not have done this way within a Perl script, since Perl has everything that sed can do, but I did not really suspect that it should be the... suspect.

Re: [anujajoseph] Need help to fine tune perl script to make it faster( currently taking more than 30 minutes)
[In reply to]

Can't Post

In Reply To

but i will still try out your suggestion to amend the code and update you on the same. thanks again!

You probably should not (unless because you just want to try the experiment just to know). There is no point optimizing code that is performing fast enough by a wide margin.

When looking at your code, I did not really try to understand the detailed algorithm, because the formatting (lack of indentation) made it tedious to read. Looking at it again in the light of what you said, it is obvious that the re-reading an entire file with sed each time through within a loop reading another file is likely to be slow (although it depends also on the relative size of each file). In such cases, you really want to store one of the files in an array (or an hash, depending on what you went to do with it) in memory, insofar as possible in view of the file size, so that each file is read only once.

Now that you have solved the 10% of code (actually less than that, just one line of code) that accounts for more than 99% of the time spent, there is no point trying to optimize further.