Abstract

Identification of somatic mutations, based on data from next-generation sequencing of the DNA, has become one of the fundamental research strategies in oncology, with the goal to seek mechanisms underlying the process of carcinogenesis and resistance to commonly used therapies. Despite significant advances in the development of sequencing methods and data processing algorithms, the reproducibility of experiments is relatively low and depending significantly on the methods used to identify changes in the structure of the DNA. This is mainly due to the influence of three factors: (1) high heterogeneity of tumors due to which some mutations are characteristic for a small number of cells, (2) bias associated with the process of exome isolation and (3) specificity of data pre-processing strategies.

The aim of the work was to determine the impact of these factors on the identification of somatic mutations, allowing to determine the reasons for low reproducibility in such studies.

Keywords

Notes

Acknowledgements

This work was partially supported by the National Centre for Research and Development grant No. Strategmed2/267398/4/NCBR/2015 (KPM), the National Science Centre grant No. 2016/23/D/ST7/03665 (RJ), and by internal grant of Institute of Automatic Control BK-204/RAu1/2017 (AS).

Calculations were carried out by means of the infrastructure of the Ziemowit computer cluster (www.ziemowit.hpc.polsl.pl) in the Laboratory of Bioinformatics and Computational Biology, The Biotechnology, Bioengineering and Bioinformatics Centre Silesian BIO-FARMA, created in the POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 projects.