Interpretive Summary: From the beginning of agriculture, plant breeders have faced the problem of trying to predict and shape hundreds of plant characteristics, without a basic understanding of the molecular causes of those traits. Traditionally, breeders have made crosses between two good plants and have looked for better ones in the progeny, hoping not to lose other beneficial traits in the process. In this paper the authors report the complete genome sequence of soybean: essentially every letter of the approximately billion DNA letters that make up the soybean genetic code. The authors also report predictions of the locations and letters of every gene in the genome. The genes are the instructions used by each cell to build up the whole plant and to respond to the environment. These results provide plant breeders and researchers with the basic blueprint for the soybean genome. This will make it possible, eventually, to understand any soybean trait, and improve on many characteristics. A great deal of additional work is required to make use of the genome blueprint, but already in the year since the sequence was released to the public (in late 2008), the sequence has been used to identify genes responsible for digestibility in soybean and common bean, for phytate production in soybean seeds (which currently results in environmentally damaging phosphate runoff from swine and poultry waste), and for plant resistance to the devastating soybean disease Asian Soybean Rust. The genome sequence is also an important resource for understanding the extent of plant diversity. The ability to speed breeding efforts, while maintaining diversity, benefits both growers and consumers through new varieties that are higher-yielding and more nutritious and stress- and disease-resistant.

Technical Abstract:
We report the genome sequence for soybean (Glycine max var. Williams 82), one of the most important crop plants worldwide because of its ability to produce both protein and oil. Soybean is a recently domesticated legume that plays a vital role in crop rotation as it fixes atmospheric nitrogen via symbioses with soil-borne microorganisms. The 1,115 Mbp genome was sequenced by a whole genome shotgun approach and integrated with physical and high-density genetic maps to create a chromosome scale draft sequence assembly. We predicted 46,430 protein-coding genes in soybean, 70% more genes than the model plant Arabidopsis and similar in total number to the 45,555 genes in the Populus trichocarpa tree genome which, like soybean, is also a paleopolyploid. Of the predicted genes, 21.6% are in repeat-poor, highly recombinogenic portions of the chromosomes. In the soybean lineage history, there have been two large-scale duplication events (polyploidies). These have resulted in nearly 75% of the genes being present in multiple copies. This is evident in many gene families. Soybean has nearly twice as many genes as Arabidopsis thaliana involved in acyl lipid metabolism (1,127 vs. 614) and twice as many transcription factors (5,671 vs. 2,315). The two duplication events occurred ~59 and ~13 Mya, producing massive genetic redundancy, followed by gene diversification and loss, as well as numerous chromosome rearrangements. The release of an accurate soybean genomic sequence will allow rapid identification of the underlying genetic basis of many soybean traits, and will speed the creation of improved new soybean varieties.