Dear jiangyqcn,
You wrote:
>Hi collegues,
>>I recently cloned some genes of Brassica napus by RT-PCR/RACE
>through mining ESTs in GenBank.
Note - if you did RT-PCR, you did not clone "genes", you cloned
"cDNAs". This is a very important distinction that you should be
careful to make.
>The primers were designed based on ESTs. After sequencing more than
>2 clones for each gene, I found, for quite a few genes, obviously,
>alleles exist. The most common is nucleotide substitution, next
>indel(insertion/deletion). An interesting phenomenon I found is for
>those alleles with a a long insertion compared to the other allele,
>the insertion is not 3-folds, i.e. the insertion leads to premature
>stop codon in the insertion or immedialtely after the insertion. The
>means that the translated amino acid sequence of one allele is
>normal (compared to At homolog), while the other is much shorter
>(premature stop codon).
>>My question is why the insertion is not 3-folds and they are, for
>example 79bp, 100bp, etc? Are the alleles bearing the 79bp, 100bp
>insertion pseudogenes? BTW, I used high-fidelity polymerase when
>cloning those genes and, sequenced the clones from two ends with the
>same sequence, which excludes sequencing errors. Although RTase is
>error-prone, I have no way to predict this.
What you see is common, most of your data do not likely reflect
either alleles or pseudogenes. The base substitutions you see could
potentially represent alleles, or homeologous genes. There should be
two different sequences because B. napus appears to be an
allotetraploid of B. oleraceae and B. rapa, so you would get
sequences from each. But is is also quite likely that some
substitutions represent errors by RT-ase or resulting from PCR.
The insertions are almost certainly retained introns. These can
result from either the presence in your RNA samples of nuclear RNA
precursors that have not yet been fully spliced (notably in many
cases poly-A addition can happen prior to splicing - so oligo dT will
still work to prime these transcripts for RT). In addition, it is
not uncommon for partially spliced transcripts to erroneously leave
the nucleus. If they have internal stop codons they should be
subject to nonsense-mediated decay and should, in theory, be at
relatively low frequency, but we and others see such partially
spliced cDNA products for many different genes. You can check this
theory by looking for "GT" at the start of the insert and "AG" at the
end of the insert, consistent with their being introns. Also, your
insert sized of >65 bases are consistent with their being introns.
Thus, you can get multiple different partially spliced cDNAs from a
single gene - with no need to invoke the existence of pseudogenes.
For example, for one gene family, in one maize EST database we found
nearly 90% of the cDNA sequences to include retained introns.
>>>Could someone provide any explanation?
See above.
>>Thanks.
You are welcome.
Chuck Gasser
U. C. Davis (and arab-gen moderator)