Motivations Generating expressed sequence tags (ESTs) remains a primary method for gene discovery in most organisms. Identifying alternatively spliced transcript isoforms of a gene is an important step in gene functional annotation and for downstream experimental chracterization. The server is designed for identifying alternatively spliced transcripts from EST-derived sequences. Note: This server can be used for mapping ESTs to the genome (from which where the ESTs were derived), but it is not designed for predicting alternatively spliced genes from genomic sequences only.

How does it work? If a genomic sequence file (with multiple sequences in fasta format) is provided by a user, EST-derived sequences (ESTs including cDNAs) will be mapped to the genomic sequences using SIM4 software. ESTs mapped to the same genomic locus and overlapped a certain length with a high similarity (two parameters chosen by a user) and also having exon/intron variations in the overlapping region are treated as "alternatively spliced" transcripts from a single gene. However, if no genomic sequences are provided, the ESTs will be used to perform a self-BLASTN, that is, NCBI-BLASTN will be performed to use the set of ESTs as both a "query" and a "database". ESTs having high similarities at both ends but having an unaligned internal fragment are treated as "alternatively spliced" transcripts.

Input

1) A file contains EST-derived sequences (ESTs, cDNAs or contig sequences assembled from ESTs) in FASTA format. EST-derived sequences are suggested to be assembled to remove redundance using an EST assembler, such as Phrap, CAP3, TIGR Assembler (see Min et al. 2009), or EST2uni. If the EST data are not pre-assembled, i.e., redundant, the results will contain the "redundant" transcripts. Thus pre-assembling ESTs is recommended. Note: The number of EST/cDNA sequences in a file or copy/paste is limited to a maximium of 100,000. If you have >100,000 ESTs, please request a standalone version of the software.

2) Optional (Required): A file contains the genome sequences of the same species. Although this file is optional, if the genome is available (completely sequenced with a good quality), the user should provide the genome seqeunces for alignment. As the genome is absolutely required for EST mapping and the output files from genome alignment will be used for further AS events analysis.

Note: if the genome sequences contains a number of super-contigs, you may split the file into several files (each contains one contig sequence), however, you should submit one set data per day or wait until after you get results to have a new submission.

3) Parameters: there are two parameters that can be chosen by a user on the server home page. The minimum aligned fragment length and the minimum identity of the aligned fragments are used to define the "alternatively spliced transcripts" from a gene locus.

Note: The total combined data file size (EST file and genome file) is limited to 50 Mb only. Using the following EST sequences and genomic sequences for testing.

Output If only ESTs are provided, the output files include (1) BLASTN output file, (2) AS clusters (alternatively splited transcripts clusters), (3) a multiple sequence alignment (MSA) file for AS isoforms generated by MUSCLE. If genomic sequences are provided, the output files include (1) SIM4 alignment file, (2) a file with a modified GTF (gene transfer format) format containing tab-delimited alignment information for all ESTs, (3) AS clusters, and (4) AS specific gtf file (AS.gtf) which contains EST alignment informatin of AS transcripts. The accuracies of the methods implemented in the server were evaluated using Aspergillus niger EST data and Arabidopsis mRNA sequences and the results were reported in the paper (Min 2013).

Stand-alone tool for download
The standalone version of the software is available free for academic use only. It is written in Perl and need to run in LINUX for the SIM4 software. Please download at following site for downloading.