$cod{$1} has a subscript of $1. And $1 refers to the first matched parten in the last regular expression, i.e. in this case the digits representing the colors. So, you are right.

In fact, looking again at the warning message:

Code

Use of uninitalized value ($count or $change) in concatenation (.) or string at Script.pl line 24, <IN> line XXXX.

the uninitialized value occurs earlier in that line, somewhere in the string "$SNP $count $change\n" which contatenates the three variables. So either $SNP, either $count or either $change is not initialized (or possibly two oçf them or all three) most probably because one og the three prior regex match did not work as expected (or some part of the data is not matching on the regex).

So presumably your data is larger than what you showed us and somewhere else down the input file, some part of the data is not consistent with the data segment you gave us.

Your warning message says:

Code

Use of uninitalized value ($count or $change) in concatenation (.) or string at Script.pl line 24, <IN> line XXXX.

The "line XXXX" part (XXXX is presumably a number) says where in the input file the error occurred. Locate the first of these lines number XXXX in the input file and look carefully at the group of three lines in the file before that line.

If you don't see anything wrong, post these lines, I or someone else might find out.

These two diagnostic pragmas will tell you a lot about possible errors in your program, often even before it runs. They will force you to declare your variables ("my" statement) and to think about where they need to exist.

-Always check the return value of system calls such as open file - Use the more modern syntax to open your files (see example below).

So, this is quick rewrite of your script with such advice in mind (plus a little bit of reformatting for clarity, but you don't need to agree with me on the reformatting):

One final additional point: I don't think the hash at the beginning is very useful since you are checking $cod{$1} only against "non", which will match only if $1 = 2, so that you could have your conditional statement:

print OUT "$SNP $count $change\n" if $1 == 2; # or : if $1 eq '2';

and could forget altogether about the %cod hashtable (unless of course you showed us only part of your code).

All of your data consists of groups of three records. Your code will fail if this pattern is broken because it will have access to 'stale' data. A better approach is to enforce the order. Report all discrepancies.

Yes, the error message tells you where it encountered the problem in the input file, at the very end of the message (what you quoted as XXXX in your post is actually yhe line number where the problem occurred. All you need is that line plus the two previous ones.

Assuming you error message is on line 128, you could print the faulty lines by typing at the prompt:

My problem is that it appears most of the lines return the error (I assume because they become out of synch from a missing or duplicated line). The result is my command line runs for ~30 seconds listing lines where the error is found, and then when it finishes I am limited with how far up I can scroll to find the source of the error.

Thanks for your reply. I ran your script after changing 'warn' to 'die' in this line -

Code

die "Invalid data block near $.\n"

It returned the same result - repeated errors of uninitialised values running for ~30 seconds with the furthest back cell I can see being 69523. The output remained the same - only the 'coordinate' column filled with the other two left blank.

The number of your colour=2 line should be divisible by three. Your lines are out of sync. The error is reported for this line because it is the first line after the real error that attempted to print and failed. Open your data file with your editor and find the first "colour=' line that has a line number that is not divisible by three. Find the last "colour=' line before it that does. The error is between those two lines. Remember, you have already narrowed your hunt to 52 lines. Good Luck, Bill

I'm sorry but I am not sure I fully understand. I am on Windows, and I typed exactly what you put into my perl command line (changing the file names to the files that I have) and it returns-

Code

>was unexpected at this time

If I remove the second '>' it returns-

Code

The system cannot find the file specified.

Sorry, I was probably not clear enough.

The "<param if any>" notation was just to meant "put there your params if your script needs one", it did not mean to tell you to put this literally.

The idea was to have a command like: "perl program > file.txt" if there was no parameter, and ""perl program param1 param2 > file.txt" if you had two parameters.

But anyway, you hame more forward in between, and, if I understand the last messages, it seems that the first error occured on line 52 of your input, so you don't have to read too much of your input file to find where it is. And since 52 cannot be divided by 3, you probably have an out-of-sync problem, due either to an inconsistency in your input file, or to a yet-unnoticed bug in your program leading to, for example, a line being skipped during processing.

If you still cannot find where, please post you whole input until the line corresponding to the first occurrence of the warning, we can try to find out.

I ran the code you posted in post #20 against the data that you posted in post #17. It ran without error and output one data record with all three fields. Perhaps Something is changing in the cut and paste process. Would you please post the 51-line data file as an attachment

So far, the helpers have indirectly stated that you first need to correct the issues with the input before you can parse the data correctly. While that might be the ideal approach, it is not always/often possible. Instead, I suggest that you add additional/different error checking to accommodate the problems.

Here's my test script, which has the minimal level of error checking that I think it might need, but in a production level script, the error checking/handing would be expanded.

In this example I'm putting the input data inside the the script but also include commented out lines that read the input data from an external file.

And here are the "error" messages sent to stderr, which I might direct to an "error" file for later review.

Code

format error parsing "colour" at or near line 19 - skipping this record at D:\test\Perl-1.pl line 54, <DATA> line 19. format error parsing "colour" at or near line 48 - skipping this record at D:\test\Perl-1.pl line 46, <DATA> line 48.

I definitely agree that the code should be more robust for inconsistent data, either by being able to cope with some data format variation or by at least checking the input and producing error message when the format in unexpected.

At the same time, I believe that knowing inside out the data you are dealing with id of paramount importance. I am dealing daily with masses of data from external sources. There are so many possible errors or format variation that it is not possible to forecast every possible error or format variation. But knowing very well the data goes a long way to help successful data munging.

As an additional note, relying on a succession of 3 types of line is not robust enough. You need for each line check that it is a line of the type you expect. If it isn't, then rejecti it with an error message and try to get back in sync by searching the next first line of a group of three (otherwise you'll have to reject the whole file).

The reason it did not produce the desired output was because the format of these "note" lines differ from what you previously posted and because of that the regex failed to match so it moved to the next record(s) which also failed for the same reason.