Manual Reference Pages - BIO::SEARCH::HSP::BLASTHSP (3)

CONTENTS

A Bio::Search::HSP::BlastHSP object provides an interface to data
obtained in a single alignment section of a Blast report (known as a
High-scoring Segment Pair). This is essentially a pairwise
alignment with score information.

The construction of BlastHSP objects is performed by
Bio::Factory::BlastHitFactory in a process that is
orchestrated by the Blast parser (Bio::SearchIO::psiblast).
The resulting BlastHSPs are then accessed via
Bio::Search::Hit::BlastHit). Therefore, you do not need to
use Bio::Search::HSP::BlastHSP) directly. If you need to construct
BlastHSPs directly, see the new() function for details.

For Bio::SearchIOBLAST parsing usage examples, see the
examples/searchio directory of the Bioperl distribution.

Sequence endpoints are swapped so that start is always less than
end. This affects For TBLASTN/X hits on the minus strand. Strand
information can be recovered using the strand() method. This
normalization step is standard Bioperl practice. It also facilitates
use of range information by methods such as match().

o

Supports BLAST versions 1.x and 2.x, gapped and ungapped.

Bio::Search::HSP::BlastHSP.pm has the ability to extract a list of all
residue indices for identical and conservative matches along both
query and sbjct sequences. Since this degree of detail is not always
needed, this behavior does not occur during construction of the BlastHSP
object. These data will automatically be collected as necessary as
the BlastHSP.pm object is used.

BlastHSP.pm can provide the query or sbjct sequence as a Bio::Seq
object via the seq() method. The BlastHSP.pm object can also create a
two-sequence Bio::SimpleAlign alignment object using the the query
and sbjct sequences via the get_aln() method. Creation of alignment
objects is not automatic when constructing the BlastHSP.pm object since
this level of functionality is not always required and would generate
a lot of extra overhead when crunching many reports.

User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one
of the Bioperl mailing lists. Your participation is much appreciated.

rather than to the module maintainer directly. Many experienced and
reponsive experts will be able look at the problem and quickly
address it. Please include a thorough description of the problem
with code and data examples if at all possible.

Usage : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );
: Bio::Search::HSP::BlastHSP objects are constructed
: automatically by Bio::SearchIO::BlastHitFactory,
: so there is no need for direct instantiation.
Purpose : Constructs a new BlastHSP object and Initializes key variables
: for the HSP.
Returns : A Bio::Search::HSP::BlastHSP object
Argument : Named parameters:
: Parameter keys are case-insensitive.
: -RAW_DATA => array ref containing raw BLAST report data for
: for a single HSP. This includes all lines
: of the HSP alignment from a traditional BLAST
or PSI-BLAST (non-XML) report,
: -RANK => integer (1..n).
: -PROGRAM => string (TBLASTN, BLASTP, etc.).
: -QUERY_NAME => string, id of query sequence
: -HIT_NAME => string, id of hit sequence
:
Comments : Having the raw data allows this object to do lazy parsing of
: the raw HSP data (i.e., not parsed until needed).
:
: Note that there is a fair amount of basic parsing that is
: currently performed in this module that would be more appropriate
: to do within a separate factory object.
: This parsing code will likely be relocated and more initialization
: parameters will be added to new().
:
See Also : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>

Usage : $hsp_obj->signif()
Purpose : Get the P-value or Expect value for the HSP.
Returns : Float (0.001 or 1.3e-43)
: Returns P-value if it is defined, otherwise, Expect value.
Argument : n/a
Throws : n/a
Comments : Provided for consistency with BlastHit::signif()
: Support for returning the significance data in different
: formats (e.g., exponent only), is not provided for HSP objects.
: This is only available for the BlastHit or Blast object.

Usage : $hsp_obj->evalue()
Purpose : Get the Expect value for the HSP.
Returns : Float (0.001 or 1.3e-43)
Argument : n/a
Throws : n/a
Comments : Support for returning the expectation data in different
: formats (e.g., exponent only), is not provided for HSP objects.
: This is only available for the BlastHit or Blast object.

Usage : $hsp_obj->p()
Purpose : Get the P-value for the HSP.
Returns : Float (0.001 or 1.3e-43) or undef if not defined.
Argument : n/a
Throws : n/a
Comments : P-value is not defined with NCBI Blast2 reports.
: Support for returning the expectation data in different
: formats (e.g., exponent only) is not provided for HSP objects.
: This is only available for the BlastHit or Blast object.

Usage : $hsp_object->frac_identical( [seq_type] );
Purpose : Get the fraction of identical positions within the given HSP.
Example : $frac_iden = $hsp_object->frac_identical(query);
Returns : Float (2-decimal precision, e.g., 0.75).
Argument : seq_type: query or hit or sbjct or total
: (sbjct is synonymous with hit)
: default = total (but see comments below).
Throws : n/a
Comments : Different versions of Blast report different values for the total
: length of the alignment. This is the number reported in the
: denominators in the stats section:
: "Identical = 34/120 Positives = 67/120".
: NCBI-BLAST uses the total length of the alignment (with gaps)
: WU-BLAST uses the length of the query sequence (without gaps).
: Therefore, when called without an argument or an argument of total,
: this method will report different values depending on the
: version of BLAST used.
:
: To get the fraction identical among only the aligned residues,
: ignoring the gaps, call this method with an argument of query
: or sbjct (sbjct is synonymous with hit).

Usage : $hsp_object->frac_conserved( [seq_type] );
Purpose : Get the fraction of conserved positions within the given HSP.
: (Note: conservative positions are called positives in the
: Blast report.)
Example : $frac_cons = $hsp_object->frac_conserved(query);
Returns : Float (2-decimal precision, e.g., 0.75).
Argument : seq_type: query or hit or sbjct or total
: (sbjct is synonymous with hit)
: default = total (but see comments below).
Throws : n/a
Comments : Different versions of Blast report different values for the total
: length of the alignment. This is the number reported in the
: denominators in the stats section:
: "Identical = 34/120 Positives = 67/120".
: NCBI-BLAST uses the total length of the alignment (with gaps)
: WU-BLAST uses the length of the query sequence (without gaps).
: Therefore, when called without an argument or an argument of total,
: this method will report different values depending on the
: version of BLAST used.
:
: To get the fraction conserved among only the aligned residues,
: ignoring the gaps, call this method with an argument of query
: or sbjct.

Title : homology_string
Usage : my $homo_string = $hsp->homology_string;
Function: Retrieves the homology sequence for this HSP as a string.
: The homology sequence is the string of symbols in between the
: query and hit sequences in the alignment indicating the degree
: of conservation (e.g., identical, similar, not similar).
Returns : string
Args : none

Usage : $hsp->rank( [string] );
Purpose : Get the rank of the HSP within a given Blast hit.
Example : $rank = $hsp->rank;
Returns : Integer (1..n) corresponding to the order in which the HSP
appears in the BLAST report.

Usage : called automatically during object construction.
Purpose : Parses the raw HSP section from a flat BLAST report and
sets the query sequence, sbjct sequence, and the "match" data
: which consists of the symbols between the query and sbjct lines
: in the alignment.
Argument : Array (all lines for a single, complete HSP, from a raw,
flat (i.e., non-XML) BLAST report)
Throws : Propagates any exceptions from the methods called ("See Also")

Usage : called automatically by _set_seq_data()
: $hsp_obj->($seq_type, @data);
Purpose : Sets sequence information for both the query and sbjct sequences.
: Directly counts the number of gaps in each sequence (if gapped Blast).
Argument : $seq_type = query or sbjct
: @data = all seq lines with the form:
: Query: 61 SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120
Throws : Exception if data strings cannot be parsed, probably due to a change
: in the Blast report format.
Comments : Uses first argument to determine which data members to set
: making this method sensitive data member name changes.
: Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
Warning : Sequence endpoints are normalized so that start < end. This affects HSPs
: for TBLASTN/X hits on the minus strand. Normalization facilitates use
: of range information by methods such as match().

Usage : called automatically when residue data is requested.
Purpose : Sets the residue numbers representing the identical and
: conserved positions. These data are obtained by analyzing the
: symbols between query and sbjct lines of the alignments.
Argument : n/a
Throws : Propagates any exception thrown by _set_seq_data() and _set_match_seq().
Comments : These data are not always needed, so it is conditionally
: executed only upon demand by methods such as seq_inds().
: Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).

Usage : $hsp_obj->_set_match_seq()
Purpose : Set the match sequence for the current HSP (symbols in between
: the query and sbjct lines.)
Returns : Array reference holding the match sequences lines.
Argument : n/a
Throws : Exception if the _matchList field is not set.
Comments : The match information is not always necessary. This method
: allows it to be conditionally prepared.
: Called by _set_residues>() and seq_str().

Usage : $hsp_obj->n()
Purpose : Get the N value (num HSPs on which P/Expect is based).
: This value is not defined with NCBI Blast2 with gapping.
Returns : Integer or null string if not defined.
Argument : n/a
Throws : n/a
Comments : The N value is listed in parenthesis with P/Expect value:
: e.g., P(3) = 1.2e-30 ---> (N = 3).
: Not defined in NCBI Blast2 with gaps.
: This typically is equal to the number of HSPs but not always.
: To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().

Usage : $hsp->matches([seq_type], [start], [stop]);
Purpose : Get the total number of identical and conservative matches
: in the query or sbjct sequence for the given HSP. Optionally can
: report data within a defined interval along the seq.
: (Note: conservative matches are called positives in the
: Blast report.)
Example : ($id,$cons) = $hsp_object->matches(hit);
: ($id,$cons) = $hsp_object->matches(query,300,400);
Returns : 2-element array of integers
Argument : (1) seq_type = query or hit or sbjct (default = query)
: (sbjct is synonymous with hit)
: (2) start = Starting coordinate (optional)
: (3) stop = Ending coordinate (optional)
Throws : Exception if the supplied coordinates are out of range.
Comments : Relies on seq_str(match) to get the string of alignment symbols
: between the query and sbjct lines which are used for determining
: the number of identical and conservative matches.

Usage : $hsp->seq( [seq_type] );
Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object.
Example : $seqObj = $hsp->seq(query);
Returns : Object reference for a Bio::Seq.pm object.
Argument : seq_type = query or hit or sbjct (default = query).
: (sbjct is synonymous with hit)
Throws : Propagates any exception that occurs during construction
: of the Bio::Seq.pm object.
Comments : The sequence is returned in an array of strings corresponding
: to the strings in the original format of the Blast alignment.
: (i.e., same spacing).

Usage : $hsp->seq_str( seq_type );
Purpose : Get the full query, sbjct, or match sequence as a string.
: The match sequence is the string of symbols in between the
: query and sbjct sequences.
Example : $str = $hsp->seq_str(query);
Returns : String
Argument : seq_Type = query or hit or sbjct or match
: (sbjct is synonymous with hit)
Throws : Exception if the argument does not match an accepted seq_type.
Comments : Calls _set_seq_data() to set the match sequence if it has
: not been set already.

Information about the various data members of this module is provided for those
wishing to modify or understand the code. Two things to bear in mind:

1 Do NOT rely on these in any code outside of this module.

All data members are prefixed with an underscore to signify that they are private.
Always use accessor methods. If the accessor doesnt exist or is inadequate,
create or modify an accessor (and let me know, too!).

2 This documentation may be incomplete and out of date.

It is easy for these data member descriptions to become obsolete as
this module is still evolving. Always double check this info and search
for members not described here.

An instance of Bio::Search::HSP::BlastHSP.pm is a blessed reference to a hash containing
all or some of the following fields: