Script is running for long time for copy,grep and find operation on a large no files

Can't Post

i have around 50 sub directories in the main directory /app/g1adm/ In each sub directory i have to do below operations.

1> exclude some predefined filenames which are present in an array @predefined. 2> then recusively find .rex,.fmd,.pld,.sh,.sql and other files(which are not of mentioned types) then a>copy the .rex files to /temp/Reports folder run shell script named convert.sh which takes that .rex filename as input parameter (for example convert.sh aa.rex) b>copy .pld to /temp/price run shell script named price.sh which takes that .pld filename as input parameter (for example price.sh ab.rex)

.rex file ---> the filetype will be "registration report" .pld file ---> the filetype will be "population report" .sql file ---> the filetype will be "SQL FILE" .sh file ---> the filetype will be "SCRIPTS" other file ---> the filetype will be "OTHERS"

Just for note i have around 50,000 files...so whatever logic i have implemented , it is taking long long time... Could you please suggest how to do this in parallel processing i,e for each subdirectory the script will create one process to perform the above things.Or any suggestion to improve the script run timings is welcome.

Re: [millan] Script is running for long time for copy,grep and find operation on a large no files
[In reply to]

Can't Post

I would not parallelize according to the subdirectories, because the number of files may vary considerabley between subdirectories, but on the files to be processed.

A first shot could go like this:

(1) Collect a list of files to be processed (2) Go through the list and, for each file, spawn a child process to process the file, but ensure that you don't have more than a certain number of child processes running at a time (i.e. start a new child only when a "processing slot" becomes available).

This strategy can be improved, though:

- Since the number of files is large compared to the number of processes you will likely run in parallel, you could save the overhead of spawning many childs, by passing several files to one child process for processing.

- If you feel that the setup time (to collect the files for processing) already takes an unreasonably long time, you could already start child processes while collecting.

There is yet another possibility, which is outside of Perl: The xargs command line tool should be able to do exactly what you want (at least the Linux version supports a --max-procs switch). In case you are on Windows, you could use the Cygwin version of xargs, or the one from the GnuTools for Windows.