I'm being thrown back into Perl for the first time in 9 years, so I'm considering myself a "beginner" again. I have a .csv file which I need to manipulate. Each line starts with three fields. I need to append a new field to the front of each line that is a concatenation of the first three fields, i.e. if each line starts with

$field1 $field2 $field3

I need to add CONCAT($field1, $field2 $field3) to the beginning of each line/record. To be honest, not a clue where to start with that.

Then I also need to sort the entire file in order of occurence by "init-tran-date", "update-tran-date" and "update-tran-time" which are at spread out locations in each 40 field line/record. Would it be easier to put the fields into an array and sort based on their index location?

I at least remember some of the basics from the old days! Hope it still holds true -

I need to add CONCAT($field1, $field2 $field3) to the beginning of each line/record. To be honest, not a clue where to start with that.

There several ways to concatenate the fields. You could simply make a double quoted string with the 3 fields, or you could use the join function. Then use the unshift function to add that string to the beginning of the array.

Your multi field sorting requirement can be done by using the sort function. The documentation on the sort function gives a couple examples and a simple google search will provide you with more extensive examples.

You may do that to handle your line (for example for putting the three fields at the beginning), but in the end, for the sorting in accordance with those fields you need to use, you'll need each line to be in an array, and, therefore, the file to be an array or arrays. I would personally go for an array immediately.

Your comment implies slurping the data into an array before doing anything with it. If the csv data is simple and doesn't have any complexities such as embedded comma's or some fields with quotes and some without , then I'd probably agree. However, if the fields aren't that simple, then I'd use the module loop over the data and build the array.

In looking at what I need to really do here (I'm parsing this file with over 2 million records), I've realized I actually need to add the concatenated field to the beginning of each record first (that will become a primary key for a MySQL table), then sort it by that new primary key field at the beginning of each record, THEN a subsort by init-date, update-date, update time, and write it all back to the file.

This file will then be used to load a MySQL table and the program that handles the load needs the records in that sorted order so that it deletes all but the most recent record for each key.

The date fields are in MM/DD/YYYY format so I guess I could use a date sort, but the last key needs to be based on the update-date/update-time combination.

A script that someone else has written will be used to load the table - it has a built-in "update" function that overwrites a record if the key already exists, hence it's need to be "pre-sorted." Only the most recent record should be loaded into the MySQL table.

I went to your above link, but most of the language there is a bit above me.

The DBD::CSV module allows you to access the csv file via sql statements as if it was an actual database. Using an sql statement should allow you to do the 3 field concat and multi field sorting in one step.

Trying to just get a simple start and test whether I can open/read a file. I put just the first four lines of the file into a test.csv, then wanted to make sure I could get Perl to open and read the file. I was expecting it to show me just the first field of each record. Instead, it's showing me the entire first line/record and that's it. What am I doing wrong here?

I was expecting it to show me just the first field of each record. Instead, it's showing me the entire first line/record and that's it. What am I doing wrong here?

In theory, your code should work and do what you want.

When you say "the entire first line/record", do you mean the header line which you are processing separately, or the next line?

Since the code seems correct to me (there may be some improvements, but it should work), please show your input file, so that we can try to figure out why: 1. it reads apparently only the first record; 2. it apparently does not split the record into fields.

All of a sudden this morning, it's working. Don't ask me why. My guess is the VI editor I'm having to use may be leaving in characters I'm not seeing. I usually try to edit the script in a simple text file, then copy it into the VI editor for the server I'm working on and I re-copied it this morning after deleting the script first, so my guess is there was an hidden character I wasn't seeing.

The one thing I wasn't expecting was how it displayed the fields from each record. I expected to see:

1334 2525 1350

But I get

133425251350

What do I need to add to show each record on it's own line? I am attaching my test CSV file.

This is at least helping me to get a better idea of how the data is manipulated and I'm hoping will get me closer to how to add the new field to the front of each record.

open my $FH, $filename or die "Could not read from $filename <$!>, program halting.";

# Read the header line. my $line = <$FH>;

# Display the header, just to check things are working. print $line, $/;

# Read the lines one by one. while($line = <$FH>) {

# split the fields, concatenate the first three fields, # and add it to the beginning of each line in the file chomp($line); my @fields = split(/,/, $line); unshift @fields, join '_', @fields[0..2]; print $fields[0], $/; }

Much of the sorting language is over my head - I understand the cmp commands in the sorts, but the mapping has me lost. I need to sort on init-tran-date, update-tran-date and update-tran-time because the latter two fields could be empty (if the initial record never had an update to it). And the .csv file needs to be sorted because it will also be used with a sqlBatch60 Java program (over which I have no control - the data just needs to already be sorted to work with that already written sqlBatch60 program).

I think I'd rather load a MySQL table with what I have now, SELECT * with a sort on those three fields, and output that back into a new .csv file.

# split the fields, concatenate the first three fields, # and add it to the beginning of each line in the file chomp($line); my @fields = split(/,/, $line); unshift @fields, join '_', @fields[0..2]; push @data, \@fields; } close $FH; print "Unsorted:\n", Dumper(@data), $/;

# split the fields, concatenate the first three fields, # and add it to the beginning of each line in the file chomp($line); my @fields = split(/,/, $line); unshift @fields, join '_', @fields[0..2]; push @data, \@fields; } close $FH; print "Unsorted:\n", Dumper(@data), $/;