Thus I want to run a single script for teh entrire file (5 lakh Rows) And want the code to be effective as far as time is concerned, Now it is taking around 24 hours for full file and After Fragmenting the file to 10 files , its taking around 3 hours for each file.

Can A Single Input File(However Huge) as Input , Single Run , With some parallel proccesing possible?

if u look at the data , last column values will be from 3,4,5,6 or 7 (these are country_id's for US,CANADA,GREATBRITAN,DENMARK AND SPAIN) So Can we create a process using fork for each ID, so that five processes run parallelly at a time and accomplish the task

is slow, but we don't really know what it is really doing. If, as the name implies, it iterates on an array for each input line, then that may be what is slow. There may be some better way of doing that, but we don't have information to help you.

I would really look at that before investigating parallel processing. Now if you really want to use parallel processing, there are a number of ways, including: - Using the shell (assuming you are on some form of Unix) to launch several background Perl processes in parallel; - Forking several processes within your Perl program - Using light-weight processes or threads.

For splitting the data, I would not recommend using the last column if such column represents a country. It is quite likely that some countries have many more records than some others so that you will end up with poor splitting, with some processes having a lot of work and others completing much earlier, so that in the end, you might have only one or two processes running while the others are completed and, in fine, don't gain much from parralel processing. I would rather use something like the line number in the file for splitting the data.

Assuming you want to run 10 processes, as an example, then there are two basic ways of doing it: having a preliminary process splitting you file into ten temporary files, and then having ten processes, each processing one file (or splitting the data into more files and having each process processing several files); or having your processes all reading the same file and processing only the lines for that process (for example, process 0 could process only lines whose line number ends with 0, etc.).

Without knowing what the code line I highlighted above actually does, I can't give any more advice than those general guidelines.

Before trying parallel processing, it may be beneficial for you to first profile your current script using NYTProf, as it can give you a good performance overview of your script, including bottleneck areas which you may be able to refactor for performance improvement.

Here is the code And I would like to split the code into different processes depending on the best possible way .So that things will be completed quickly

Code

use Date::Format; use Getopt::Std; use Company::Admin::Finance::Utils qw (amz_info amz_fatal get_month_start_end_dates encom_date_converter); use Company::Admin::Finance::DBSession; use Company::Admin::Finance::LoggingDBSession; use Company::Admin::Finance::RcslUtilities; use Company::Admin::Finance::RcslMap;

#open input file and output file for reading and writing open BADDEBT, "< $BADDEBT" or die "Could not open output file: $!\n"; open (Tot_Missing_amnt, ">$Tot_Missing_amnt") or die "unable to open esid output file: $!";