From 28583b4f98c75dbec4f73da03ce2c357d82036c7 Mon Sep 17 00:00:00 2001
From: Mark Howison
Date: Mon, 12 May 2014 14:56:09 -0400
Subject: [PATCH] updated description of phix-test
---
phix-test/README.md | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/phix-test/README.md b/phix-test/README.md
index 1e8436f..3fa113b 100644
--- a/phix-test/README.md
+++ b/phix-test/README.md
@@ -20,22 +20,22 @@ Illumina (two 3.5GB files) and run it through the `filter_illumina` tool from
[BioLite](https://bitbucket.org/caseywdunn/biolite). Alternatively, you can
skip this step and use the `subset.*.fq.gz` files that we have alread generated
and included in this directory. It will also download the reference assembly
-[NC_001422](http://www.ncbi.nlm.nih.gov/nuccore/9626372) from the Sequence Read
-Archive (also included in the repo as `phix-reference-NC_001422.fa`).
+[NC_001422](http://www.ncbi.nlm.nih.gov/nuccore/9626372) from the NCBI Nucleotide
+Database (also included in the repo as `phix-reference-NC_001422.fa`).
The `01-assemble.sh` script will constuct the assembly graph and run Velvet and
SGA on the subset. The assembly graph output should have two clusters
(`cluster0.graphml` and `cluster1.graphml`) which are reverse complements of
-each other, and should each have 36 nodes and 72 edges. The script will also
+each other, and should each have 110 nodes and 148 edges. The script will also
generate figures (`compare?.pdf`) showing which edges in the assembly graph are
-also present in the Velvet and SGA assemblies and the SRA reference assembly.
+also present in the Velvet and SGA assemblies and the NCBI reference sequence.
The `02[a-c]-sample.sh` scripts are the most computationally intensive and
run sample from three independent chains. They includes directives for
the SLURM cluster management system. You may need to adapt these to your own
cluster environment. By default, we use `cluster0.graphml` from above and run
-100,000 sample iterations, with an estimated runtime of 42 hours and peak memory
-usage of 10GB for each chain, each running on a 20-core Intel Xeon E5-2670 v2
+20,000 sample iterations, with an estimated runtime of ~7 hours and peak memory
+usage of 10GB for each chain, each running on an 8-core Intel Xeon E5540 (2.53Ghz)
node. If you want to run a quick test, you could reduce the iterations to
1,000, although this will probably fail to converge. On our cluster, we would
launch the three scipts with:
@@ -46,11 +46,11 @@ launch the three scipts with:
Once the sampling has finished, use the `03-report.sh` script to generate a
report in the directory `report-X` where X is the current time. You can
-compare this report against the sample one provided in the `report-1391230948`
+compare this report against the sample one provided in the `report-1399918695`
directory. Figures 3 and 5a-b in the paper are from this report.
The remaining scripts 04 through 08 run a study of different priors, and are
-also computationally intensive, with an estimated runtime of 42 hours and peak
+also computationally intensive, with an estimated runtime of ~7 hours and peak
memory usage of 10GB for each run. We would submit these on our cluster with:
sbatch 04-sample-nodata.sh
--
2.1.1