Abstract

Although GenBank has now covered over 1,400,000 expressed sequence tags (ESTs) from soybean, most ESTs available to the public have been derived from tissues or environmental conditions rather than developing seeds. It is absolutely necessary for annotating the molecular mechanisms of soybean seed development to analyze completely the gene expression profiles of its immature seed at various stages. Here we have constructed a full-length-enriched cDNA library comprised of a total of 45,408 cDNA clones which cover various stages of soybean seed development. Furthermore, we have sequenced from 5' ends of these clones, 36,656 ESTs were obtained in the present study. These EST sequences could be categorized into 27,982 unigenes, including 22,867 contigs and 5,115 singletons, among which 27,931 could be mapped onto soybean 20 chromosome sequences. Comparative genomic analysis with other plants has revealed that these unigenes include lots of candidate genes specific to dicot, legume and soybean. Approximately 1,789 of these unigenes currently show no homology to known soybean sequences, suggesting that many represent mRNAs specifically expressed in seeds. Novel abundant genes involved in the oil synthesis have been found in this study, may serve as a valuable resource for soybean seed improvement.