NSF CCF Project (2010-2014)

NSF CCF:AF:Small - A Unified Computational Framework to Enhance the Ab-Initio Sampling of Native-Like Protein Conformations

The research involves the design and analysis of a framework to compute the spatial arrangements, also known as conformations, in which a protein chain of amino acids is biologically-active (in its native state). This is an important goal towards understanding protein function. While proteins are central to many biochemical processes, little is known about millions of protein sequences obtained from organismal genomes.

Intellectual Merit: The intellectual merit of this work lies in the development of a novel computational framework that combines probabilistic exploration with the theory of statistical mechanics to efficiently enhance the sampling of the conformational space near the native state. Low-dimensional projections guide the exploration towards low-energy and geometrically-diverse conformations. Additional intellectual merit lies in the incorporation of knowledge and observations emerging from biophysical theory and experiment, such as the use of coarse graining, relation between energy barrier height and temperature, and hierarchical organization of tertiary structure. Algorithmic components of the framework will be systematically evaluated for efficiency, accuracy, and how they enhance the sampling of the conformational space near the native state.

Broader Impact: The broader impact of this research will be the creation of a filter that efficiently computes diverse coarse-grained conformations relevant for the protein native state that can then be further refined through detailed biophysical studies. The work lies at the interface between computer science and protein biophysics and can benefit both communities. On the computational side, the work will lead to new algorithms on modeling articulated chains characterized by continuous high-dimensional search spaces and complex energy surfaces. On the biophysical side, the framework will elucidate which aspects of our understanding of proteins allow efficient and accurate modeling. The work will impact both undergraduate and graduate students. New courses are proposed by the investigator as part of efforts to introduce computational biology in the computer science curriculum at George Mason University. The work will be employed as a pedagogic device in courses and educational outreach venues to spawn and maintain interest in computer science, with a particular focus on women and minorities.

This project included three graduate students and several undergraduate and high-school students. Contributions included:

  • Executable for linux.
  • Active education of involved communities through workshops, tutorials, and software demos at widely-attended conferences and society meetings.
  • 11 peer-reviewed publications, 2 M.S. theses, and 1 Ph.D. thesis.