I am trying to assemble a 30-35Mbp diploid genome using Abyss from HighSeq Illumina runs. It takes a very long time (days) to compute assembly for just one k-mer using default settings on a 12 CPU/large RAM machine.

Hence my questions:

what is your experience with
abyss-bwa and abyss-bowtie, both
performance- and quality of
assembly/scaffoldling-wise?

I use NFS-mounted partitions for
both data and temp directories,
which I guess slows down Abyss. How
do I estimate how much of local disc
space I will need for local temp
directory?

Compression settings. I have found
this Biostar post about pbzip2
and speed. Has anyone done
comparisons with gzip/pigz?

scaffolding off. It seems that majority
of my runs, Abyss spends on mapping
reads to assembly/scaffolding. Since
I am exploring k-mer space, I want
to get contigs, check N50s, compare
the assembly with related species
genomes, then pick few good looking
k-mers and rerun assembly with i.e.
differently
filtered/base-error-corrected data
sets. Can I switch off the whole
scaffolding part?

openmpi & Abyss: are there any i.e.
minimum RAM requirements for cluster
nodes to run Abyss without crashing?

Yes, I know there is a Abyss mailing list, but it takes a long time to get an answer from the overworked developer. Trawling through the archives did not gave me clear answers so far.

Thanks a lot for your help.

EDIT (partial answers)

ad 1: according to ABySS author, the default mapper/scaffolder performs better quality-wise than abyss-bwa and abyss-bowtie

I think you may figure it out by yourself. To speed things up, the key is to identify the bottleneck. Just run abyss normally and check "top" every half an hour to see which steps takes most of time. My guess is graph construction and simplification take most of time. As to scaffolding, if you assemble reads as single-end, I guess scaffolding will be skipped.

Days seems excessive - I'd expect hours on my server (24 CPU, 100 GB RAM). Assembly can often be "held up" by a very small number of "rogue reads" which mess up the graph, so you may want to look at some quality filtering to reduce the number of input reads.