Abstract: Long-range and highly accurate de novo assembly from short-read data is one
of the most pressing challenges in genomics. Recently, it has been shown that
read pairs generated by proximity ligation of DNA in chromatin of living tissue
can address this problem. These data dramatically increase the scaffold
contiguity of assemblies and provide haplotype phasing information. Here, we
describe a simpler approach ("Chicago") based on in vitro reconstituted
chromatin. We generated two Chicago datasets with human DNA and used a new
software pipeline ("HiRise") to construct a highly accurate de novo assembly
and scaffolding of a human genome with scaffold N50 of 30 Mb. We also
demonstrated the utility of Chicago for improving existing assemblies by
re-assembling and scaffolding the genome of the American alligator. With a
single library and one lane of Illumina HiSeq sequencing, we increased the
scaffold N50 of the American alligator from 508 kb to 10 Mb. Our method uses
established molecular biology procedures and can be used to analyze any genome,
as it requires only about 5 micrograms of DNA as the starting material.