Hi I have a very large tab delimited file with around 3000 fields with a header row. It is in the format ID start end name1 name2 name3... etc to name 3000. My aim is to parse this file into 3000 separate files. Each file should have the first 3 fields of the source file ID start end and the next consecutive name field, starting at name1. And the next file should have ID start end name2... etc to name3000.

I'm new to Perl but I assume that I should read and write the first 3 columns to the new text file with the fourth column being a variable where the columns numbers from 4 to 3000 are looped over and passed to the $column4 variable each time a new file is created.

All I have been able to do so far is to read the first column of the text file with the Text::CSV module.

Thanks very much for your help. I have tested your code and it seems to be working.

Sadly I have no control over the source file. It would be easier if received these in a more convenient format. the reason I would like to split them is that I then want to import them into MYSQL as tables with the name from the header as the table name, which will then be queried. this will be much more efficient if this is done by joining multiple tables on the primary key rather than importing the current source file into one huge table. (only 10-20 tables will be queried at a time according to user request)

So Ideally I would like to take the header name from the the fourth field of each file as the new file name. This can be done at the file split stage or when I import the files to my sql which I am more familiar with.

The data under the header looks like this. with id, start, end as integers and each name has a long field of decimal values. chr22 50163501 50164000 0.0216543833380203 0.161245861239383 0.146899443524876

So for each file I am looking for an output like this: ID Start End name1 chr22 1 500 0.0216543833380203 chr22 251 750 0.0374431201561708

I am creating a PHP application where the user will provide a list of up to 20 name fields, these are then converted to an array and these columns are selected from the tables in the MYSQL database with a dynamic query. The reason I chose to split the large file by name is that MYSQL or any database does not generally hold data in rows lengths larger than around 8000 bytes, my source file has much more than this. Also querying a table with much more than 50 columns starts to get very inefficient, its much less resource intensive to create smaller tables and query them using a join on a related key. In order to do this I need to have the source data normalized before I import it.

I have been trying to play around with your code to print to multiple text files within the while loop with no success. I'm very new to Perl. The file name is not so important at this stage. they can just be name1 ... name3000.

Saying you're having "no success" is a very poor problem description and doesn't give us any info on what you're doing or the results you're getting.

What have you tried? You need to post your code.

What were the results?

How do the results differ from what you expected?

Did you receive any errors/warnings and if so what were they?

When you say you're trying to print to multiple files within the loop, are you saying you're trying to write to 3000 different files on each iteration of the loop?

Quote

I need to have the source data normalized before I import it.

That does not require creating 3000 separate files which then need to be parsed when inserting the data into the database.

You normalize the database structure/schema before anything else. Then as you parse each line of your source data, you either insert each record as needed, or if the record needs to be built up from multiple lines, you'd store the data in proper data structure, such as a hash, and do the insert(s) when the record has all of its data points.

BTW, creating 3000 tables with the exact same structure is far from being normalized.