When the input was paired end, we will put out three the reads in three fractions R1,R2 and se
The se are the paired end reads which lost their mate during the filtering.
The se are seamlessly integrated in the next steps.

As the binning can predict several times the same genome it is recommended to de-replicate these genomes.
For now we use DeRep to filter and de-replicate the genomes.
The Metagenome assembled genomes are then renamed, but we keep mapping files.

genomes/Dereplication

genomes/clustering/contig2genome.tsv

genomes/clustering/allbins2genome.tsv

The fasta sequence of the dereplicated and renamed genomes can be found in genomes/genomes
and their quality estimation are in genomes/checkm/completeness.tsv.
The quantification of the genomes can be found in:

Different annotations can be turned on and off in the config file under the heading annotations:
A taxonomy for the dereplicated genomes is proposed GTDB.
The results can be found in genomes/taxonomy.
The genomes are placed in a phylogenetic tree separately for bacteria and archaea (if there are any) using the GTDB markers.
In addition a tree for bacteria and archaea can be generated based on the checkm markers.
All trees are properly rooted using the midpoint. The files can be found in genomes/tree

The gene catalog takes either genes predicted from the genomes or all genes predicted on the contigs and clusters them
according to the configuration.
This rule produces the following output file for the whole dataset.