The variables $VAR123 and $VAR125 are scalars. Each contains a single character string. These string appear to contain pipe (|) delimited fields.

The variables $VAR124 and $VAR126 are each a reference to an array of hashes. The array which $VAR124 refers to contains only one element (a reference to a hash). The other array contains three elements.

Please refer:

Code

>perldoc perldsc

for a tutorial on complex data structures. You probably should also read all the documents mentioned in its 'SEE ALSO' section. Good Luck, Bill

Thanks for the link, but now I'm unsure as to which type of 'multidimensional' data structure I should use. I wish to be able to refer to specific hashes and compare them to equivalent hashes within the same array. And all within loops may I add.

In my script I have created a (1st)hash of (2nd)arrays of (3rd)hashes but now for each key in my (1st)hash I need to loop through the (2nd)arrays comparing their respective (3rd)hashes i.e the spacers of each array with the other (2nd)arrays for that particular key of a (1st)hash.

Code

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Regexp::Common qw /number/;

Is it to instruct the computer to pick the spacer value from each and every array of hashes? And not just the first one.

The script is supposed to compare the spacers like CAGT , GTT and TTG together and detect whether they are the same. And if they are the same it is supposed to choose the spacer that is associated with the lowest energy score. It would then, if possible, dispense of the arrays which have the same spacer but highest energy scores from %HoA_sequences.

After that in phase 2 entitled, 'may the best hairpin win', all the remaining hairpins are compared with each other to determine whether they overlap or not. If they do not overlap they are kept in the %HoA_sequences. However, if any 2, or any 3 (etc) hashes overlap then similar to phase 1, the spacer with the lowest energy score is kept and the highest scored spacers are dispensed with, leaving %HoA_sequences containing only distinctive hairpins per sequence. For phase 2, I will need to use the range between any given hairpin as in the start and end for determination of overlaps.

The values of the HoA are each a reference to an array. The 'values' function returns a list of these references. The 'for' loop assigns these, one-at-a-time, to $array. The term @$array dereferences these references (returns a list of hash references). The function 'map', one-at-a-time, assigns these hash references to $_, evaluates the term $_->{spacer} and returns the list of results. (Note that the arrow operator, in that term, dereferences the hash reference and returns the value corresponding to 'spacer'.) The list of spacers, corresponding to each value of %HoA, is pushed into the array @spacers.

The special value $LIST_SEPARATOR ($") controls how an array is interpolated into a string. Here it is used to print every element on a separate line.

Should I assume that you want to process each array of the %HoA separately? If not, I still do not understand your requirements. Good Luck, Bill

You now have a reasonable specification for phase I. You have the code for building the data structure, and I already showed you how to dereference the data. Give it a try. Ask when you need help with a specific question. Hint: Use a temporary hash to hold the "best so far" for each value of 'spacer'. Good Luck, Bill

So far I am able to compare the first spacer with all the rest and if they're all equal I can then find out which has the lowest energy and pick that one out. But I am struggling to come up with an algorithm that will detect when some (not all) of the spacers are the same and to then pick the one with the lowest energy out of those 'some'. For example, if 2 out of 5 spacers are the same then there are essentially 4 spacers.