Recommendations for open data science

1. Provide or cite source code for tools in public repositories Software packages are critical in data-driven studies. Other researchers can only fully utilize results if the details of the computational methods used are completely transparent. Provide source code for all original software in public repositories such as Github [3] with a license permitting as free use as possible. Cite precise versions and command line prompts for all previously published tools. Do not use proprietary software that cannot be accessed by other researchers. Follow best practices in software engineering [4] to make tools easy to use and maintain by others. A common hurdle for reusing software is the cumbersome and heterogeneous steps required to install a specific program. To alleviate this issue, deposit tools in standard package managers, such as…