Imagine a DNA sample containing a mixture of different intact plasmids. These samples are sequenced using either MiSeq or HiSeq sequencing. Would it possible to assemble these plasmids post-sequencing as they would have been when sequenced individually?

$\begingroup$Would you know the sequences? I mean, will you have reference sequences for each plasmid which you can use for assembling? How similar are the plasmid sequences? Will you have multiple ~100nt sequences that align perfectly to many different plasmids? Please edit your question and add more details.$\endgroup$
– terdonFeb 22 '19 at 14:13

$\begingroup$Don't forget a template assembly, to confirm your results once you are clear what plasmids you are dealing with .. e.g. bowtie2, any paper on bacterial metagenomics will help you. Read length vs. sequence similarity is important$\endgroup$
– MichaelFeb 22 '19 at 16:07

2 Answers
2

You would expect to have high coverage, given the plasmids are short, so de novo assembly would be likely very easy. Given that each plasmid is present in different multiples, you would expect different coverage on each plasmid, so it might be best to approach it as a metagenome-type or transcriptome assembly, rather than a classic genome assembly.

Alternatively, Google search just told me there is a tool called plasmidSPADES designed for just your purpose. SPADES itself is a reliable genome assembler, so I would imagine plasmidSPADES to be a good choice. It does assume that there is also some whole bacterial genomes in the sequencing mix though.

As @terdon commented, it does depend on whether you have many identical copies of each plasmid, or many similar but not identical plasmids in the mix.

$\begingroup$Yes I had the same ideas. I am very acquainted with SPADES,although never tried de novo assembly on a mixture of plasmids.$\endgroup$
– Roelof CoertzeFeb 22 '19 at 16:50

$\begingroup$It looks as though plasmidSPADES expects to find some 'long' chromosomes in the mix, so it might be important to throw in some whole genome DNA into your mix of plasmids, to train the assembler. You could do this after the experiment, by appending a fastq file of E.coli genomic DNA to your fastq file of plasmid sequences...$\endgroup$
– Jonathan MooreFeb 22 '19 at 17:06

$\begingroup$Coincidentally, the DNA sample is "contaminated" genomic DNA since it was extracted from an environmental sample. That should provide the necessary SPADES training?$\endgroup$
– Roelof CoertzeFeb 22 '19 at 18:37

$\begingroup$If it's from an environmental sample might be better to try a metagenome assembler, maybe metaspades. Alternatively, use kmer spectrum to try to pull the dataset apart based on abundance, and run the assembler on cleaned-up subsets of the reads$\endgroup$
– Jonathan MooreFeb 24 '19 at 9:13

Let's take a step back and consider the "perfect" output for a de novo assembly algorithm. Ideally, you would like to see one complete sequence for molecule (chromosome, plasmid, etc.). In reality, this is difficult to achieve due to a couple factors.

By random chance some regions may have low coverage, meaning that there are an insufficient number of reads spanning the region to reconstruct it de novo.

A bigger problem is that some portions of the genome are repetitive and occur at many different locations, sometimes even on different chromosomes.

So in practice, it's very common that de novo assemblies can only recover partial fragments of each molecule, and sometimes fragments are erroneously fused due to shared repetitive content.

Back to your question: it's certainly possible to assemble distinct plasmids that were sequenced together without any special sample prep. How well this will work in practice will depend on the amount of coverage/redundancy in the sequenced reads and the amount of DNA shared between the plasmids. If they don't share much in common, and you sample to a sufficient depth of coverage, you shouldn't have any problem assembling them.