Abstract

Nowadays, it has become almost a necessity for many biologists to execute bioinformatics workflows (WFs) as part of their research. However, most WF-management software packages require for their operation at least some programming expertise. Here we describe NeatSeq-Flow, a platform that enables users with no programming knowledge to design and execute complex high throughput sequencing WFs. This is achieved by using a compendium of pre-built modules as well as a generic module, both do not require programming expertise. Nonetheless, NeatSeq-Flow retains the flexibility to generate sophisticated WF modules using templates and only basic Python programming abilities. NeatSeq-Flow is designed to enable easy sharing of WFs and modules by conceptually separating modules, WF design, sample information and execution. Moreover, NeatSeq-Flow works hand in hand with CONDA environments for easy installation of the WF's analysis programs in one go. NeatSeq-Flow enables efficient WF execution on computer clusters by parallelizing on both samples and WF steps. NeatSeq-Flow operates by shell-script generation; thus it allows full transparency of the WF process. NeatSeq-Flow offers real-time WF execution monitoring, detailed documentation and self-sustaining WF backups for reproducibility. All of these features make NeatSeq-Flow an easy-to-use WF platform while not compromising for flexibility, reproducibility, transparency and efficiency.

Copyright

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.