Motivation In molecular biology, protein structure prediction that takes an amino acid sequence as input to predict its 3D structure plays a key role in down-streaming scientific researches such as cancer, regulatory network, rational drug design and de-novo protein design. Among all methods of protein structure prediction, the template-based modeling is a well-developed technique.
Challenge The challenge of template-based modeling lies in the recognition of correct templates and generation of accurate sequence-template alignments. Homologous information has proved to be very powerful in detecting remote homologs. However, this information does not fare well when proteins under consideration are low-homology. Moreover, contact-specific information as well as the alignment reference state construction will contribute more accuracy to the final threading result.
Method Here we present a novel protein threading method, CNFpred, which achieves much more accurate sequence-template alignment by employing a probabilistic graphical model called a Conditional Neural Field (CNF), which aligns one protein sequence to its remote template using a non-linear scoring function. Moreover, we apply a novel context-specific alignment potential that measures the log odds ratio of one alignment being generated from two related proteins to being generated from two unrelated proteins, by integrating both local and global context-specific information.
Result Experimental results confirm that our method generates significantly better alignments and threading results than the best profile-based methods on several very large benchmarks. For community-wide protein structure prediction contest, i.e. CASP, our method ranked top level among all methods around the world. Our method works particularly well for distantly-related proteins or proteins with sparse sequence profiles. Finally, a web-server not only open to public as well as fast speed is provided for biologist to get easy access to the protein structure prediction result. |