Weinstar has asked for the
wisdom of the Perl Monks concerning the following question:

Hello. For the following I feel I should lead with a quick disclaimer. The code is my first attempt at a Perl script and a non-html script in general, so it might read like the first attempt at a script - however the script technically works.

Furthermore I don't deal with coding in my daily life and studies and therefore do not know if the code conforms to all existing standards - I ran it through perl::tidy, used strict and warnings (no return) but that is about it - please advice if it is unreadable in some way.

Anyhow, The problem I am facing is one of memory - the script returns "out of memory". I believe it is related to http://learn.perl.org/faq/perlfaq3.html#How-can-I-make-my-Perl-program-take-less-memory- but cannot figure out how to implement without changing the results of the run.

The case is as follows:
I need to go over 20 seperate excel files (.xls) and grap values based on certain patterns and then copy them into new .XLSX sheets. The total amount of rows is close to 1 million.

The script worked fine when I tried to write to .xls but it hit the limit of available rows. So i changed it to ::XLSX and now it runs out of memory. The strange thing is that it does not seem to be in the writing of excel part that the memory expires, but around the time it is done with grabing values, or maybe when I copy one of the returned array at @righ = @right (done because the original is destroyed in a splice). The system I work on has 8GB of ram, the error happens at around a total memory use of 4GB of ram, but I can see from test on a single sheet, that the final part (when all values are in the arrays) takes up a lot of memory. I have a feeling that this memory is excessive and can be done in better way. The script is on around 2GB of memory at the time of error. four of the eight gb were installed after the original run to see if this helped.

My question is - how do I solve this? I understand that there may be room for loads of improvements and I will gladly listen to every kind of response, but my main concern goes to the apparent memory shortage. I have a feeling I am copying the data into memory several times, but I do not know how to avoid this.

If I have overlooked an already existing answer a reference would be appreciated and an apology returned, but please understand that any deviation from my case will quickly complicate things for me

I haven't used XLSX myself and this might be a naive suggestion but could your situation be satisfied with CSV files as your output (which you could then convert to XLSX files in Excel if you want to send XLSX to others instead of CSV)? Especially if you have huge amounts of data you can simply print to a raw text file, but you can also use very streamlined modules like Text::CSV.

As anonymous before me stated, use subroutines to break up your program into smaller more manageable chunks. Then try to optimize each segment. Not only will this get you closer to solving your problems and more than likely eliminate alot of the memory usage along the way, but it will allow you to post a more concentrated chunk of code that is more likely to attract the attention of the monks.

From perlfaq3:

"When it comes to time-space tradeoffs, Perl nearly always prefers to throw memory at a problem. Scalars in Perl use more memory than strings in C, arrays take more than that, and hashes use even more."

You have created an enormous amount of temporary scalars that are all named very similarly (i.e. $c1..$c39) and used in very repetitive code. To start off, maybe try creating a subroutine to handle those more efficiently.

This is just an example, but I'm sure if you go through your program you'll be able to find multiple situations where things can be cut down and made faster, simpler, and easier to maintain/read.
It might do you well to clean up your comments, and include a sample of the data you plan on working with before making a post like this. It'll increase the likelihood that you'll get help from the guys that know most. You might try reading How (Not) To Ask A Question. Good Luck!

I need to go over 20 seperate excel files (.xls) and grap values based on certain patterns and then copy them into new .XLSX sheets. The total amount of rows is close to 1 million.

I think an Excel sheet with one million lines is a clear case for using a relational database (plus a report generator) instead of Excel. If you have no better idea, use MS Access as a front-end to some stripped-down MS SQL Server (last time I used it the stripped-down version was called MSDE) or PostgreSQL.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Hello. Thank you to all of you for your reponses. There seems to be a lot to go on and a general sentiment, that this is doable, and with a lot of optimization, so that is great. It will proberbly take me some time to work through it all, but I will post the final update, as soon as i am done, for future reference. Again I thank you for taking your time to look over my problem.

Hi,
I'm using Spreadsheet::XLSX to parse the xlsx file. for around 20000 rows , the memory it takes around 800 MB, though size of file is 5 MB. Can anybody helps me to optimize it?
following is the code: