Although many tools designed for structural variation analysis there are no guidelines for bench scientists to help them choose the best tool for their particular data set. In this test case, a researcher is eager to identify large structural variants in multiple accessions of Arabidopsis. What is needed is a tool which can identify the type and quality of the sequencing reads, assemble them with regards to a reference genome, identify SNPs, short indels and large indels, and determine if any annotated genes are associated with the large indels. Lastly, it is desirable that the tool should have a user-friendly interface with which bench scientists can easily examine structural variants in their accession(s) of interest that need to be validated. The long term goal would be to scale up this process to analyse multiple genomes at a time (for example, multiple time points, treatments, or related individuals).

Background from the first hackathon

In the first hackathon, a benchmarking study was performed on synthetic and biological data to test several existing structural variant analysis tools. However, these tools are largely limited to diploid genome analysis and are unable to identify large structural variants (>1Kb) in addition to being unable to identify more complex variants, such as inversions, translocations and copy number variants.

Objective of the second hackathon

Examining larger structural variants and incorporating additional types structural variants, including inversions, translocations and copy number variants into the analysis pipeline we had previously developed. Tools currently available to identify these types of variants include inGAP, SVdetect, CNVnator, CNVseq, DWACseq. We would use simulated data and perform biological testing on a small subset of the tetraploid wheat genome. Following successful testing of the synthetic and validation in a biological setting, we would convert the pipeline into a bench-scientist friendly graphical user interface.