Dan,
Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live.
Test and extend,
-Heikki
2009/4/27 Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>:
> Dan,
>> I'll take your code and put it into bioperl-live rewritten the way I
> suggested and add few tests.
>> That should get you started,
>> -Heikki
>> 2009/4/27 Dan Bolser <dan.bolser at gmail.com>:
>> Hi Heikki,
>>>> Thanks very much for the advice on how to better implement the clear
>> range method within the Bio::Seq::Quality object. I can understand the
>> logic of what you have written, and it all sounds reasonable. The only
>> problem is that I am very inexperienced with working on object
>> oriented Perl (my 'one man' projects to date have never really
>> required me to think beyond scripts, and its been years since I
>> actually tried to code objects in Perl).
>>>> To be specific, when you say, "Lets add a method that sets the
>> threshold and stores it internally as $self->_threshold", ignoring any
>> other functionality, what would that method look like? in particular,
>> how would $self->_threshold be implemented?
>>>> I think once I see that detail, I can go ahead and try to code what
>> you suggested.
>>>>>> Similarly (Chris), where would I put the tests / how would they be implemented?
>>>>>> Thanks again for the feedback.
>>>> All the best,
>> Dan.
>>>>>>>> 2009/4/27 Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>:
>>> Dan,
>>>>>> It looks like your method does two different things:
>>>>>> 1. Returns the longest subsequence above the threshold
>>> 2. Analyses the the sequence for the number of ranges the current
>>> threshold creates.
>>>>>> Why not separate these functions?
>>>>>> Lets add a method that sets the threshold and stores it internally as
>>> $self->_threshold. Setting it to a new values should trigger emptying
>>> all the caches (see below.)
>>>>>> Lets have two more public methods:
>>>>>> 1. get_clean_range() - optional argument 'threshold'
>>>>>> It returns the longest clean subseq.
>>>>>> 2. count_clean_ranges() -again optional argument 'threshold'
>>>>>> This returns the number of ranges detected.
>>>>>> Both methods call first the public method threshold if the argument
>>> has been given and then an internal method _find_clean_ranges(). That
>>> method calculates all the ranges and stores them internally (as
>>> $self->_clean_ranges-> [...]). The number of ranges is also stored
>>> (e.g. $self->_number_of ranges).These internal values form the cache
>>> that needs to be emptied whenever any of the critical values of the
>>> object changes: threshold, quality or seq. Create an internal method
>>> $self->_clear_cache, that does that.
>>>>>> Now the quality new object does not get created until you call
>>> get_clean_range() which accesses the cached values (or creates them if
>>> they are not there).
>>>>>> This design allows you to have no extra penalty for adding more
>>> methods that act on cached values. For example, it might be sensible
>>> thing to do at some point to look at all the ranges that are longer
>>> than some length. Then you could write in your program:
>>>>>>>>> $qual->threshold(10);
>>> if ($qual->count_clean_ranges = 1) {
>>> my $newqual = $qual->get_clean_range()
>>> # do your analysis
>>> } elsif ($qual->count_clean_ranges = 0) {
>>> # do some reporting and logging
>>> } else { # more than one ranges
>>> my @quals = $qual->get_all_clean_ranges($min_lenght);
>>> # do some more work and possibly select the best one(s)
>>> }
>>>>>>>>>>>> Yours,
>>>>>> -Heikki
>>>>>> 2009/4/24 Chris Fields <cjfields at illinois.edu>:
>>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If
>>>> possible, tests don't hurt either!
>>>>>>>> chris
>>>>>>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote:
>>>>>>>>> Its a bit rough and ready, but it does what I need...
>>>>>>>>>>>>>>>>>>>>>>>>> =head2 get_clear_range
>>>>>>>>>> Title : get_clear_range
>>>>>>>>>> Title : subqual
>>>>> Usage : $subobj = $obj->get_clear_range();
>>>>> $subobj = $obj->get_clear_range(20);
>>>>> Function : Get the clear range using the given quality score as a
>>>>> cutoff or a default value of 13.
>>>>>>>>>> Returns : a new Bio::Seq::Quality object
>>>>> Args : a minimum quality value, optional, devault = 13
>>>>>>>>>> =cut
>>>>>>>>>> sub get_clear_range
>>>>> {
>>>>> my $self = shift;
>>>>> my $qual = $self->qual;
>>>>> my $minQual = shift || 13;
>>>>>>>>>> my (@ranges, $rangeFlag);
>>>>>>>>>> for(my $i=0; $i<@$qual; $i++){
>>>>> ## Are we currently within a clear range or not?
>>>>> if(defined($rangeFlag)){
>>>>> ## Did we just leave the clear range?
>>>>> if($qual->[$i]<$minQual){
>>>>> ## Log the range
>>>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>>>> ## and reset the range flag.
>>>>> $rangeFlag = undef;
>>>>> }
>>>>> ## else nothing changes
>>>>> }
>>>>> else{
>>>>> ## Did we just enter a clear range?
>>>>> if($qual->[$i]>=$minQual){
>>>>> ## Better set the range flag!
>>>>> $rangeFlag = $i;
>>>>> }
>>>>> ## else nothing changes
>>>>> }
>>>>> }
>>>>> ## Did we exit the last clear range?
>>>>> if(defined($rangeFlag)){
>>>>> my $i = scalar(@$qual);
>>>>> ## Log the range
>>>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>>>> }
>>>>>>>>>> unless(@ranges){
>>>>> die "There is no clear range... I don't know what to do here!\n";
>>>>> }
>>>>>>>>>> print "there are ", scalar(@ranges), " clear ranges\n";
>>>>>>>>>> my $sum; map {$sum += $_->[2]} @ranges;
>>>>>>>>>> print "of ", scalar(@$qual), " bases, there are $sum with ".
>>>>> "quality scores above the given threshold\n";
>>>>>>>>>> for (sort {$b->[2] <=> $a->[2]} @ranges){
>>>>> if($_->[2]/$sum < 0.5){
>>>>> warn "not so much a clear range as a clear chunk...\n";
>>>>> }
>>>>> print $_->[2], "\t", $_->[2]/$sum, "\n";
>>>>>>>>>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1,
>>>>> $_->[1]+1),
>>>>> -qual => $self->subqual($_->[0]+1,
>>>>> $_->[1]+1)
>>>>> );
>>>>> }
>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which
>>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added).
>>>>> That is why the 'new Bio::Seq::Quality object' is actually a
>>>>> Bio::Seq::QualityDB object, but other than that it should slot right
>>>>> in (apart from all the debugging output that I spit out).
>>>>>>>>>>>>>>> Cheers,
>>>>> Dan.
>>>>>>>>>>>>>>> 2009/4/24 Dan Bolser <dan.bolser at gmail.com>:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> I couldn't find out how to get the 'clear range' from a
>>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should
>>>>>> this method be a part of the Bio::Seq::Quality class?
>>>>>>>>>>>> In the latter case I'm on my way to an implementation, but I am not
>>>>>> good at navigating the bioperl docs, so I thought I should ask before
>>>>>> I take the time to finish that off.
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>> Dan.
>>>>>>>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l>>>>>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l>>>>>>>>>>>>>>>> --
>>> -Heikki
>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
>>> cell: +27 (0)714328090
>>> Sent from Claremont, WC, South Africa
>>>>>>>>> --
> -Heikki
> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> cell: +27 (0)714328090
> Sent from Claremont, WC, South Africa
>
--
-Heikki
Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +27 (0)714328090
Sent from Claremont, WC, South Africa