A sample of my file is as follows: 1457 G G SAME 1979 G G SAME 2056 T T SAME 3091 A A SAME 3562 A G DIFF 3778 A A SAME 4124 T T SAME 4229 C T DIFF 4571 A G DIFF 5019 A C DIFF 5114 C C SAME 6291 T T SAME 6414 C C SAME 6553 C C SAME 6941 G G SAME

What I want to do is for each block of either SAME or DIFF, get the positions of the last and first element in that block. It would be great to also get a count of how many "SAME"s or "DIFF"s fall in the range as well.

So for the sample file, I'd like to get an output that would look like this:

Column 1: First position of block (of same type) Column 2: Last position of block (of same type) Column 3: Number of occurrences within block (of same type) Column 4: type (SAME or DIFF)

Re: [GeneticsGirl] Help outputting first and last positions of blocks of the same type
[In reply to]

Can't Post

The first and last blocks can get tricky. For your first attempt, ignore them and write a loop that does everything else correctly. For your second attempt, we can help you with the two special cases. When that is working, you will have a working prototype of your program that we all understand. We can then discuss ways to make it clearer, faster, and more robust.

This is what the main loop of version 0 has to do:

Read and parse a line.

Decide if it is the start of a new block.

If it is not: Update last position with new position. Increment the record count.

otherwise: Print the data that you have for the previous block. Set first_position and last_position to current position Set type from the new record set count = 1. Return to read next line. Good Luck, Bill

Re: [BillKSmith] Help outputting first and last positions of blocks of the same type
[In reply to]

Can't Post

Hi Bill,

Thanks so much for the help and advice! Below is my first crack at the code - I feel I can parse the lines fine, but not sure how to tackle what you say as deciding on new blocks. I know that my loop for each line at the bottom is not working - it prints everything together, not in blocks.

Do you know of a link or info for me to look at on the web that might help explain as you had mentioned, finding the start of new blocks? Also how to set positions?

Re: [GeneticsGirl] Help outputting first and last positions of blocks of the same type
[In reply to]

Can't Post

You are over complicating the problem. Only four variables need file scope ($first_position, $last_position, block_size, and $type). Inside the loop, the only significant variables are $new_position and $new_type. There is no need for any arrays. It probably is a good idea to use $line rather than $_.

A new block is starting if $new_type is not equal to $type.

If you add these hints to my previous pseudo code, you will receive a warning about uninitialized variables while processing the first line. You will not print the last block. We will fix these problems later. Just get the loop working for the normal cases. Laurent's advice on style will help you in the long run. Good Luck, Bill

Re: [Laurent_R] Help outputting first and last positions of blocks of the same type
[In reply to]

Can't Post

Hi Laurent_R,

Great - thank you for the simplifications - it cleans up my code quite a bit!

I'm working on what you said in your previous post about memorizing the last line to see whether it was the end of a block...again, I'm very new to programming, so I'm working on it! Will probably take me to the weekend to get even this to work!

Re: [GeneticsGirl] Help outputting first and last positions of blocks of the same type
[In reply to]

Can't Post

Let me review. A block consists of four values (first position, last position, occurrence count, and classification type.) When we have these four values, we print them and initialize a new block.

I doubt that google will help you know when the block is complete. You already told us in your first post that a block consists of consecutive lines with the same classification type.

A line contains only two useful values (Chromosome position and classification type.) For each line, if the classification type of the line is the same as the classification type of the current block, this line belongs to that block. Replace the final position in the block with the chromosome position from the line and increment the occurrence count of the block. Otherwise, the block is complete - print it and then initialize a new block. Set both the first and last positions in the block to the new chromosome position, set the occurrence count to one, and the classification type to the value from the line.

You will have a problem with the first line because the block is not initialized and the last line will never get printed. Do not worry, we will fix this later.

In my version, I store the block as a hash with four keys. That hash is the only file-scope variable. Good Luck, Bill

Re: [GeneticsGirl] Help outputting first and last positions of blocks of the same type
[In reply to]

Can't Post

Here is the code which I have been describing. Although it looks much different, it works the same as Laurent's. (Except that the last position is updated correctly.) Every line (even the first) is assumed to be the last line of a block until proven false.

Data for the current Block is stored in a hash %block. In each section of the code, a single assignment statement (using hash slices) updates the block.

The do block around the initialization is not necessary. Its purpose it to limit the scope of the two temporary variables ($position and $class) used in the initialization. Note that the two 'my' variables with the same names that are used inside the loop are not the same two variables. Their scope is limited to the loop.

Refer to perldoc -f undef for an example of this use of undef.

The print statements are idiomatic. Hash slice notation is used to force the order of the values. The special variable $OUTPUT_FIELD_SEPARATOR (default: single space) is used implicitly to separate the values. (Thanks to FishMonger for suggesting this idiom in a recent unrelated post)

The print statement after the loop is needed to print the final block.

Re: [BillKSmith] Help outputting first and last positions of blocks of the same type
[In reply to]

Can't Post

Of course the code worked great for me! This is saving me a lot of time in my research - I'm so very grateful to everyone for their posts and suggestions - I'm picking up a lot from the snippets of code that have been put on here. Thank you Bill, Laurent and wickedxter - this has been a wonderful first experience for me entering a programming forum! So glad to have found somewhere I can go for help and advice as I'm trying to apply programs to speed up my research!