BLAT

BLAT is a sequence analysis tool which performs rapid mRNA/DNA and cross-species protein alignments. BLAT is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.

BLAT is not BLAST. DNA BLAT works by keeping an index of the entire genome (but not the genome itself) in memory. Since the index takes up a bit less than a gigabyte of RAM, BLAT can deliver high performance on a reasonably priced Linux box. The index is used to find areas of probable homology, which are then loaded into memory for a detailed alignment. Protein BLAT works in a similar manner, except with 4-mers rather than 11-mers. The protein index takes a little more than 2 gigabytes.

Availability & Restrictions

BLAT is available without restriction to all OSC users.

The following versions of BLAT are available at OSC:

Version

Glenn

Oakley

34

X

Usage

Set-up

To initalize the Glenn system prior to using BLAT, run the following commands:

module load biosoftw
module loat blat

Using BLAT

The main programs in the blat suite are:

gfServer – a server that maintains an index of the genome in memory and uses the index to quickly find regions with high levels of sequence similarity to a query sequence.

gfClient – a program that queries gfServer over the network, and then does a detailed alignment of the query sequence with regions found by gfServer.

blat –combines client and server into a single program, first building the index, then using the index, and then exiting.

webBlat – a web based version of gfClient that presents the alignments in an interactive fashion. (not included on OSC server)

Building an index of the genome typically takes 10 or 15 minutes. Typically for interactive applications one uses gfServer to build a whole genome index. At that point gfClient or webBlat can align a single query within few seconds. If one is aligning a lot of sequences in a batch mode then blat can be more efficient, particularly if run on a cluster of computers. Each blat run is typically done against a single chromosome, but with a large number of query sequences.