Most of the sequence file format parsers in BioPython can return '''SeqRecord''' objects (and may offer a format specific record object too, see for example Bio.SwissProt). The [[SeqIO]] system will ''only'' return SeqRecord objects.

Most of the sequence file format parsers in BioPython can return '''SeqRecord''' objects (and may offer a format specific record object too, see for example Bio.SwissProt). The [[SeqIO]] system will ''only'' return SeqRecord objects.

−

In addition to the '''SeqRecord''' object's [http://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html API documentation], there is more information in the [http://biopython.org/DIST/docs/tutorial/Tutorial.html Tutorial] ([http://biopython.org/DIST/docs/tutorial/Tutorial.pdf PDF]), and the [[SeqIO]] page is also very relevant.

+

In addition to the '''SeqRecord''' object's [http://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html API documentation], there is a whole chapter in the [http://biopython.org/DIST/docs/tutorial/Tutorial.html Tutorial] ([http://biopython.org/DIST/docs/tutorial/Tutorial.pdf PDF]), and the [[SeqIO]] page is also very relevant.

−

+

−

== Creating a SeqRecord object ==

+

−

+

−

Most of the time you'll create '''SeqRecord''' objects by parsing a sequence file with [[SeqIO|Bio.SeqIO]]. However, it is useful to know how to create a '''SeqRecord''' directly. For example,

+

−

<python>

+

−

from Bio.Seq import Seq

+

−

from Bio.SeqRecord import SeqRecord

+

−

from Bio.Alphabet import IUPAC

+

−

record = SeqRecord(Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF",

+

−

IUPAC.protein),

+

−

id="YP_025292.1", name="HokC",

+

−

description="toxic membrane protein, small")

+

−

print record

+

−

</python>

+

−

+

−

This would give the following output:

+

−

+

−

ID: YP_025292.1

+

−

Name: HokC

+

−

Description: toxic membrane protein, small

+

−

Number of features: 0

+

−

Seq('MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF', IUPACProtein())

+

== Extracting information from a SeqRecord ==

== Extracting information from a SeqRecord ==

−

Lets look in closer detail at the well annotated '''SeqRecord''' objects Biopython creates from a GenBank file, such as [http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.gbk ls_orchid.gbk], which we'll load using the [[SeqIO]] module. This file contains 94 records:

+

Lets look in detail at the well annotated '''SeqRecord''' objects Biopython creates from a GenBank file, such as [http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.gbk ls_orchid.gbk], which we'll load using the [[SeqIO]] module. This file contains 94 records:

<python>

<python>

Line 57:

Line 35:

ID: Z78439.1

ID: Z78439.1

Name: Z78439

Name: Z78439

−

Desription: P.barbatum 5.8S rRNA gene and ITS1 and ITS2 DNA.

+

Description: P.barbatum 5.8S rRNA gene and ITS1 and ITS2 DNA.

Number of features: 5

Number of features: 5

/source=Paphiopedilum barbatum

/source=Paphiopedilum barbatum

Line 143:

Line 121:

</python>

</python>

−

SeqFeature objects are complicated enough to warrant their own page...

+

SeqFeature objects are complicated enough to warrant their own wiki page... for now please refer to the Tutorial.

If you are using Biopython 1.48 or later, there will be a '''format''' method. This lets you convert the '''SeqRecord''' into a string using one of the output formats supported by [[SeqIO|Bio.SeqIO]], for example:

If you are using Biopython 1.48 or later, there will be a '''format''' method. This lets you convert the '''SeqRecord''' into a string using one of the output formats supported by [[SeqIO|Bio.SeqIO]], for example:

Line 174:

Line 152:

Have a look at FASTQ or QUAL files to see how quality scores are represented. Stockholm (PFAM) alignment files also often include per-letter-annotation.

Have a look at FASTQ or QUAL files to see how quality scores are represented. Stockholm (PFAM) alignment files also often include per-letter-annotation.

+

+

== Creating a SeqRecord object ==

+

+

Most of the time you'll create '''SeqRecord''' objects by parsing a sequence file with [[SeqIO|Bio.SeqIO]]. However, it is useful to know how to create a '''SeqRecord''' directly. For example,

+

<python>

+

from Bio.Seq import Seq

+

from Bio.SeqRecord import SeqRecord

+

from Bio.Alphabet import IUPAC

+

record = SeqRecord(Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF",

+

IUPAC.protein),

+

id="YP_025292.1", name="HokC",

+

description="toxic membrane protein, small")

+

print record

+

</python>

+

+

This would give the following output:

+

+

ID: YP_025292.1

+

Name: HokC

+

Description: toxic membrane protein, small

+

Number of features: 0

+

Seq('MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF', IUPACProtein())

+

+

You could then pass this new record to [[SeqIO|Bio.SeqIO.write(...)]] to save it to disk.

+

+

[[Category:Wiki Documentation]]

Revision as of 12:06, 26 February 2014

This page describes the SeqRecord object used in BioPython to hold a sequence (as a Seq object) with identifiers (ID and name), description and optionally annotation and sub-features.

Most of the sequence file format parsers in BioPython can return SeqRecord objects (and may offer a format specific record object too, see for example Bio.SwissProt). The SeqIO system will only return SeqRecord objects.

If you didn't already know, the dir() function returns a list of all the methods and properties of an object (as strings). Those starting with underscores in their name are "special" and we'll be ignoring them in this discussion. We'll start with the seq property:

If you are using Biopython 1.50 or later, there will also be a letter_annotations property. Again this is a dictionary but for per-letter-annotation such as sequence quality scores or secondary structure predictions. This kind of information isn't found in GenBank files, so in this case the dictionary is empty:

>>>print record.letter_annotations{}

Have a look at FASTQ or QUAL files to see how quality scores are represented. Stockholm (PFAM) alignment files also often include per-letter-annotation.

Creating a SeqRecord object

Most of the time you'll create SeqRecord objects by parsing a sequence file with Bio.SeqIO. However, it is useful to know how to create a SeqRecord directly. For example,