Fig. S1. A maximum likelihood nucleotide tree of crenarchaeotal amoA obtained from the three Namibian seawater samples. Bootstrap values (> 50% from 1000 bootstraps) are denoted at branch points. Sequences are colour coded according to types of sample origins (terrestrial, marine sediments or seawater). The marine clusters A (here denoted A1), B and C, as defined by Francis and colleagues (2005), plus a new cluster, A2, are also indicated. The majority of the Namibia seawater amoA group in marine clusters A1 and B. The Namibian amoA are represented as groups labelled ‘Namibia seawater’, with the number in brackets representing the number of unique amoA sequences from each group.

Fig. S3. Namibian seawater samples – distribution of the amoA gene (the 351 bp fragment used for polynucleotide probe design) in OTUs, at 99% and 95% identity cut-off. At 99% identity cut-off, none of the OTUs represents more than 10% of the sequences. The majority of the OTUs are represented by a very low number or individual sequences. Pooling the sequences from the three samples results in an almost doubling of the OTU number, as compared with individual clone libraries, and none of the individual OTUs are represented by more than 8% of the sequences. At a 95% identity cut-off, between 70% and 80% of the sequences are found in three OTUs. Pooling the sequences from the three clone libraries results in an increase of the number of OTUs with less than 50%, as compared with individual clone libraries, and 68% of the sequences are found in three OTUs.

Fig. S4. Distribution of the amoA gene (the 351 bp fragment used for polynucleotide probe design) across the three Namibian seawater samples. The OTUs are grouped according to a 99% identity cut-off (A) and a 95% identity cut-off (B). At a cut-off of 99% identity, the three clone libraries shared amongst them between 49% and 58% of their sequences, while between 21% and 28% were unique sequences and the remaining 21–24% sequences were shared between two clone libraries. When using 95% identity as a cut-off for OTU grouping, between 75% and 85% of the sequences were shared among the three clone libraries.

Fig. S5. A nucleotide multiple alignment of amoA from Namibia seawater samples, performed with the integrated aligner from ARB. The target region for probe design is marked in orange (position 95–445 Cenarchaeum symbiosum numbering). The sequences are trimmed according to the marked region and exported in GenBank format, with phylogenetic information as inferred from the maximum likelihood tree (see Fig. S1).

Fig. S6. The GenBank files (A) containing the trimmed sequences are transformed into FASTA files (D) using the GTE module (B) from the PolyPro software. The phylogenetic information from the SOURCE field of the GenBank files is used to create the Taxonomy database (C).

Fig. S7. The PPD module from the PolyPro software receives the FASTA files generated by the GTE as input. All the amoA sequences are given both as probes and as targets. The hybridization type is set to DNA:DNA and a mismatch table is calculated between probes (horizontal header) and targets (vertical header). In the mismatch table, the cells with percentage mismatch (%MM) lower than mismatch threshold 1 (Th1) are marked in green.

Fig. S8. The threshold for a probe to hit a target (Th1) is set to ≤ 5% mismatch. No phylogenetic clade is selected. Therefore, the probes will be designed for all crenarchaeal amoA targets retrieved from the Namibian seawater samples. The probes with identical target groups and a Tm difference lower than 0.05°C are considered replicates and only one of them will be kept. As a consequence, the number of probes decreases, as reported in the LOG. The mismatch table is transformed into a hit matrix, which will be used in the next step to calculate the probe mixes.

Fig. S9. The probe mixes which hit all the targets are calculated. The combination algorithm is initiated by combinations of 2, followed by optimized combinations. Since the Tm tolerance parameter was set to 0, only the probe mixes with the lowest ΔTm were selected during the combination algorithm. As a result, all the 600 probe mixes have the same ΔTm (1.63°C), as reported in the LOG.

Fig. S10. Step 5 was used to select probe mixes with dsDNA probes less likely to cross-hybridize. From all probe mixes resulting in step 3, only the ones with the lowest similarity between the probes were selected.

Fig. S11. Hit map for the 12 polynucleotides in the amoA-Nam probe mix. For each probe the percentage mismatch with each sequence from the clone libraries is represented as a dot. The colour of the dot is green when the respective sequence belongs to the same phylogenetic clade as the probe and red when it belongs to a different clade. The threshold for a probe to target a sequence was set to ≤ 5 % mismatch (blue line in graph).

Fig. S12. The effect of Syto9 dye concentration on the Tm peak height and shape. These Tm curves have been measured for the hynL dsDNA probe. It can be noticed that an increase in Syto9 concentration produces an increase in peak height, without a significant variation in Tm.

Fig. S13. Variation of the melting temperature with the concentration of Syto9 dye. The Tm of hynL dsDNA probe, target and hybrid were measured in hybridization-like buffer (35% formamide). The Tm did not vary much with the dye concentration, whose increase to 50 μM lead to a decrease in the Tm of ∼0.5°C.

Fig. S15. GeneFISH for simultaneous detection of amoA gene (crenarchaeal ammonia monooxygenase subunit A) with amoA-1E3 probe (red, second column) and 16S rRNA CARD-FISH with EUB338, a general bacterial probe (green, first column), on Escherichia coli clones. The last column represents the overlap between the 16S and gene signals.

Fig. S16. GeneFISH with the negative control probe for gene detection (NonPolyPr350) (red, third column) and 16S rRNA CARD-FISH with Cren554 probe for marine Crenarchaeota (green, second column) on Namibian seawater samples, station 249. All cells stained with DAPI: blue, first column. The last column represents the overlap between the 16S and gene signals.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.