Hi, I m a beginner in programming world. I m studying Perl to do a script necessary to continue my work.

Problem: I have a table, in the third column I have the characters "." or "," (the number and type is very different for each line). I d like to count how many dot or comma are present and substituting these with the number of these.

As I understand it:a) You have a file with 6 space separated fields b) You need to count the number of dots and commas in the 5th field c) You need to concatenate those 2 values and put it in the 6th (last) field

Well, the first step is obviously to isolate the field on which you need to work. Then only you can count the characters that you need. But as far as I can say, you haven't given enough information about how to get to the right field, if that is what you are looking for. It seems likely that you want to split the input data on spaces, but you haven't really said it.

Please provide a sample on your input file and let us know what you need in it.

I put a small extract of my input (extract is from 136 to 142, my whole file starts from 1 to about 40000). Anyway I paste exactly what I copied from unix. My intent is only to count the "." and "," because I need to number and not series of . or , to make a statistical test.

Can you put some comment to the code to allow me to learn the code? for example which is the part concerning the introduction, the part to isolate the right column etc.

You've posted multiple differing samples of input and haven't given a clear explanation on which fields you need to keep and which ones you don't and which one holds the data you want counted which makes it difficult for us to provide the proper solution.

Uses the module Text::CSV_XS and he creates a Text::CSV_XS object in the code to parse and print the input line. Text::CSV splits the line on a single space for input, (sep_char => ' '), and also uses a space to separate the columns for output, print.

Code

while (my $line = $csv->getline(*DATA))

Here the program reads a line (from the *DATA filehandle - your program would use a filehandle for your input file instead).

Code

$cnt{'.'} = $line->[4] =~ tr/././; $cnt{','} = $line->[4] =~ tr/,/,/;

This code uses the transliteration (tr) operator to count the periods and commas in the column noted as $line->[4]. These counts are assigned to %cnt hash which stores the counts.

Code

$line->[5] = $cnt{'.'} . $cnt{','};

This sets array item 6, (arrays count starting with 0).

Code

$csv->print(*STDOUT, $line);

This code uses the Text::CSV object to print (to the screen). To print to an output file instead, you would replace *STDOUT with your output filehandle.

I'm not a biologist so I can't begin to understand the meaning of your data but before we spend a lot of time parsing your data, you may want to look over the related CPAN modules to see if they have already solved this parsing issue.

it is a useful link if someone wants alternative way (without using samtools) to make mapping, alignment, blast etc. but not for my problem that it is not a biological problem (i want to change and grouping data in a very big table, that all.) I would like to understand the code, is not necessary that someone give me the result. I need of this pl script but I am more interested to enter in Perl mentality in order to solve my problems by myself.

This opens the file "new.txt" for output (">" means for output). The $out variable is the filehandle with which you'll be able to write data into this file. The "or die" means that the progrfam will abort if it turns out to be impossible to open the file, and "$!" will give the reason why opening the file failed.

But I think you should grap a good tutorial or perhaps better buy a book on learning Perl, such as "Learning Perl", published by O'Reilly.