Structural information facilitates understanding of protein function and activity. The limitations of experimental methods for protein structure elucidation in applicability to certain types and families of proteins, necessitates use of computational methods for protein structure prediction. Template-based methods utilize structural information from template proteins with available structures and high sequence similarity to the protein of interest. However, in the absence of such template proteins, de novo methods can be used to generate structural models. State of the art de novo methods are limited to smaller size proteins due to the size of the conformational search space that needs to be sampled.

In this study, we introduce BCL::Fold, a novel de novo protein structure method and accompanying energy potentials. BCL::Fold discontinues the chain and works by assembling secondary structure elements (SSEs); namely α-helices and β-strands. This approach leverages the fact that SSEs more readily define the topology of a protein compared to flexible loop regions. This allows the decoupling of determination of a topology from building of flexible loop regions, which in turn divides the structure prediction problem into two more manageable portions. BCL::Fold employs a Monte-Carlo Metropolis minimization where SSE-based moves allow rapid sampling of conformational search space, while knowledge based-potentials are used to evaluate how native-like the generated structural models are. BCL::Fold was benchmarked on a set of proteins with diverse sequence lengths, secondary structure contents and topologies. A native-like structural model was obtained at comparable levels to Rosetta, one of the top-performing de novo methods. Energy potentials were also evaluated and shown to successfully discriminate native-like structural models.

Accurate prediction of residue pairs, apart in the sequence but in close proximity in the structure, provides insight into the topology of a protein and therefore limits the conformational search space to be sampled in protein structure prediction. BCL::Contact is a novel method which utilizes artificial neural networks and provides rapid prediction of residue pair contacts. BCL::Contact improved the accuracy of protein structure prediction by Rosetta.