Sunday, February 12, 2012

Perl is used for manipulating text files and in this article I will teach how to read a CSV file in Perl? First of all let’s start with

What is a CSV file?

A comma-separated values (CSV) file stores tabular data (numbers and text) in plain-text form. As a result, such a file is easily human-readable (e.g., in a text editor).
CSV is a simple file format that is widely supported by consumer, business, and scientific applications. Among its most common uses is to move tabular data between programs that naturally operate on a more efficient or complete proprietary format. For example: a CSV file might be used to transfer information from a database program to a spreadsheet. -Wikipedia

Using Perl to read a CSV file

Now suppose we have a CSV file which has 4 rows and each row has fields which are separated by comma as given below:

If you want to extract out the data from 3rd column and calculate the sum of all the numbers then the approach to do this will be as follows:

Read the first line of the CSV file and extract out the 3rd column. Assign the extracted value to a scalar variable and then read the next line, extract out the value of 3rd column, add it to the scalar variable which contains the first value and store result in the same variable. Then read the 3rd line and so on…

Notice the 3rd row. The 2nd field of it has a comma “Puppy,Linux”. If we use the same approach of splitting fields by comma separator then it will throw an error because split function will split out Puppy and Linux since they are separated by a comma. Then it will do something like 1+2+Linux+4 which is wrong.

Here the 3rd row contains multi-line fields. When the above code will be used it will treat the CSV file as it contains 5 lines instead of 4.

So we can conclude that if the CSV file is in good format AND if it does not contain field separators within quote AND if there are no multi-line fields in CSV file then our code (which we have written above) will work perfectly and will give correct result. Take use of Text::CSV_XS module which is used for reading and writing CSV files, it will deal with those error prone situations very efficiently.