M.S. and Ph.D. Theses

Ph.D. Theses

  • PhD2: Brian Olson.

    Evolving Local Minima in the Protein Energy Surface .

    Ph.D. Thesis, George Mason University, July, 2013

    Committee: Amarda Shehu (chair), Kenneth A De Jong, Estela Blaisten-Barojas, Jana Kosecka, Jyh-Ming Lien.

    @phdthesis{olson2013:PhDThesis, author = {Olson, B.}, school = {George Mason University}, title = {Evolving Local Minima in the Protein Energy Surface }, year = 2013 }
    Proteins are the molecular tools of the living cell and the path to unraveling their function is through modeling and understanding their structure. Many diseases occur when a protein loses its intended function due to its inability to form the appropriate structure with which it binds to other molecules in the cell. A holistic approach to protein modeling would be to characterize all possible structural states accessible by a protein under native, physiologic conditions. However, this task is infeasible. The question then becomes, how can we model the subset of these structural states most relevant to the function or disfunction of a protein? This thesis proposes a novel computational framework to obtain an expansive view of the protein conformational space relevant for function while controlling computational cost. The framework complements experimental and high-resolution computational methods which limit their focus to a single region of the conformational space. The framework employs the knowledge that functionally-relevant conformations are those low in energy and the framework incorporates the latest understanding of protein structure and energy from biophysics. Specifically, this thesis proposes a novel stochastic search framework for exploring a diverse ensemble of conformations which capture low-energy basins in the protein energy surface. The proposed search framework employs a hybrid or memetic approach for explicit sampling of local minima in the protein energy surface. This hybrid search framework combines a global evolutionary search approach with a local search component to take advantage of the latest advances from the computational biology community. Specifically, the following questions are addressed to effectively model the protein conformational space: (1) How to balance limited computational resources between exploration of the conformational space in global search with exploitation of local minima in local search? The hybrid search framework combines a global evolutionary search to explore the breadth of the conformational space with a local search for efficiently exploiting local minima in the underlying energy surface. (2) How to sample new conformations at the global level? Two complementary approaches are investigated. One approach proposes an enhanced fragment selection method for sampling a new conformation based on an existing structure. The other approach employs a genetic algorithm to combine features from multiple existing structures to sample a new conformation. (3) How to employ energy to better discriminate between interesting conformations and noise in the conformational search space? A multi-objective decomposition of the energy function is employed to guide the search towards more biologically relevant, low-energy conformations by focusing on the energy terms with the most discriminatory power. Work in this thesis shows that, by combining advanced algorithmic components with the latest understanding of protein biophysics, the proposed search framework is able to more effectively model functionally-relevant conformational states. A direct comparison between the proposed framework and a state-of-the-art coarse-grained sampling algorithm shows that the enhanced sampling strategies lead to a more comprehensive picture of the underlying protein energy surface. By taking this more comprehensive view, the framework is able to capture the protein native state as well as or better than methods relying primarily on protein-specific sampling strategies.
  • PhD1: Amarda Shehu.

    Molecules in Motion: Computing Structural Flexibility.

    Ph.D. Thesis, Rice University, June, 2008.

    Committee: Lydia E. Kavraki (chair), Cecilia Clementi, Luay Nakhleh.

    @phdthesis{shehu2008:PhDThesis, author = {Shehu, A.}, school = {Rice University}, title = {Molecules in Motion: Computing Structural Flexibility}, year = 2008 }
    Growing databases of protein sequences in the post-genomic era call for computational methods to extract structure and function from a protein sequence. In flexible molecules like proteins, function cannot be reliably extracted from a few structures. The amino-acid chain assumes various spatial arrangements (conformations) to modulate biological function. Characterizing the flexibility of a protein under physiological (native) conditions remains an open problem in computational biology. This thesis addresses the problem of characterizing the native flexibility of a protein by computing conformations populated under native conditions. Such computation involves locating free-energy minima in a high-dimensional conformational space. The methods proposed in this thesis search for native conformations using systematically less information from experiment: first employing an experimental structure, then using only a closure constraint in cyclic cysteine-rich peptides, and finally employing only the amino-acid sequence of small- to medium-size proteins. A novel method is proposed to compute structural fluctuations of a protein around an experimental structure. The method combines a robotics-inspired exploration of the conformational space with a statistical mechanics formulation. Thermodynamic quantities measured over generated conformations reproduce experimental data of broad time scales on small (~100 amino acids) proteins with non-concerted motions. Capturing concerted motions motivates the development of the next methods. A second method is proposed that employs a closure constraint to generate native conformations of cyclic cysteine-rich peptides. The method first explores the entire conformational space, then explores in present energy minima until no lower-energy minima emerge. The method captures relevant features of the native state also observed in experiment for 20-30 amino-acid long peptides. A final method is proposed that implements a similar exploration but for longer proteins and employing only amino-acid sequence. In its first stage, the method explores the entire conformational space at a coarse-grained level of detail. A second stage focuses the exploration to low-energy regions in more detail. All-atom conformational ensembles are obtained for proteins that populate various functional states through large-scale concerted motions. These ensembles capture well the populated functional states of proteins up to 214 amino-acids long.

M.Sc. Theses

  • MS5: Daniel P. Veltri.

    Physicochemical Feature Selection for Cathelicidin Antimicrobial Peptides.

    M.S. Thesis, George Mason University, April 2013.

    Committee: Amarda Shehu (chair), Iosif Vaisman, Barney Bishop.

    @msthesis{veltri2013:MSThesis, author = {Veltri, D. P.}, school = {George Mason University}, title = {Physicochemical Feature Selection for Cathelicidin Antimicrobial Peptides}, year = 2013 }
    Due to recent attention on antimicrobial peptides (AMPs) as targets for antibacterial drug research, many machine learning methods are now turning their attention to AMP recognition. Approaches that rely on whole-peptide properties for recognition are challenged by the great sequence diversity among AMPs for e ective feature construction. This thesis proposes a novel and complementary method for feature construction which relies on an extensive list of position-based amino acid physicochemical properties. These features are shown e ective in the context of classi cation by support vector machine (SVM), both in comparison to related work in recognition of AMPs and in a novel study on the cathelicidin family. A detailed analysis and careful construction of a decoy dataset allows for the highlighting of antimicrobial activity-related features in cathelicidins. Special attention is also given to residue positions involved with enzymatic cleavage. The method presented in this thesis is a rst step towards understanding what confers to cathelicidins their activity at the physicochemical level and may prove useful for future AMP design e fforts.
  • MS4: Irina Hashmi.

    A Probabilistic Search Algorithms for Protein-Protein Docking.

    M.S. Thesis, George Mason University, November 2012.

    Committee: Amarda Shehu (chair), Kenneth A. De Jong, and Jyh-Ming Lien.

    @masterthesis{hashmi2012:MSThesis, address = {Fairfax, Virginia}, author = {Hashmi, I.}, school = {George Mason University}, title = {A Probabilistic Search Algorithm for Protein-Protein Docking}, year = {2012} }
    Computational methods able to assist or complement wet-laboratory experiments in structural characterization of molecular assemblies promise to provide detailed insight into molecular interactions, drug-design, and biological function in the living and diseased cell. Methods that predict three-dimensional structures of protein-protein assemblies are abundant in computational structural biology. However, challenges remain in accurately detecting the interacting interface between participating units in an assembly. For search algorithms, the task of predicting the biologically-active structure of an assembly poses particular challenges due to the high dimensionality of the search space where potentially relevant assembly configurations lie. The work presented in this thesis is a step towards developing a new set of computational techniques and algorithms for structural characterization of protein-protein assemblies. Specifically, the work here focuses on modeling the three-dimensional quaternary structure of a protein dimer, a complex formed by interactions between two participating protein chains. This problem is commonly known as protein-protein docking. This work addresses the problem of rigid protein-protein docking, where the given unbounded structures of the protein units about to dimerize are expected to be the same as the bounded ones after dimerization. In addition to techniques proposed to alleviate certain computational aspects related with finding the right docking interface in protein dimers, this thesis proposes a new probabilistic search algorithm that employs both geometry and energy to sample low-energy configurations of a protein dimer. Analysis of evolutionary conservation and a geometric treatment of the molecular surface are combined in order to identify potentially-relevant contact interfaces between the two units in the dimer. Docking is focused only on evolutionary- conserved geometrically-complementary regions between the units' molecular surfaces, resulting in a narrower search space of rigid-body motions matching only such regions. This treatment is the first contribution of this work. The second contribution is a probabilistic search algorithm that efficiently explores the space of rigid-body motions corresponding to local minima in an energy function capturing interactions in a dimeric configuration. The proposed algorithm is an adaptation of the Basin Hopping (BH) framework. The work presented in this thesis details implementation and careful analysis of the components that result in an effective BH algorithm for rigid protein-protein docking. Application on a diverse list of protein shows that the algorithm is able to recover the native dimeric configuration as well as produce other relevant minima near the native configuration of a given dimer. A detailed analysis is presented that shows the algorithm reproduces known properties of the BH framework in other contexts and application, most notably the relationship between adjacency between consecutively-obtained local minima and proximity to the known native dimeric configuration. Taken together, the results presented show that the algorithm can be employed as a first stage in a computational docking protocol to sample low-energy near-native dimeric configurations that can then be further refined and discriminated with more computationally-intensive optimization protocols.
  • Brian Olson.

    MS3: Local Minima Hopping along the Protein Energy Surface.

    M.S. Thesis, George Mason University, November 2011.

    Committee: Amarda Shehu (chair), Jana Kosecka, and Jyh-Ming, Lien.

    @masterthesis{olson2011:MSThesis, address = {Fairfax, Virginia}, author = {Olson, B.}, school = {George Mason University}, title = {Local Minima Hopping along the Protein Energy Surface}, year = {2011} }
    Modeling of protein molecules in silico for the purpose of elucidating the three-dimensional structure where the protein is biologically active employs the knowledge that the protein conformational space has an underlying funnel-like energy surface. The biologically-active structure, also referred to as the native structure, resides at the basin or global minimum of the energy surface. A common approach among computational methods that seek the protein native structure is to search for local minima in the energy surface, with the hope that one of the local minima corresponds to the global minimum. Typical stochastic search methods, however, fail to explicitly sample local minima. This thesis proposes a novel algorithm to directly sample local minima at a coarse-grained level of detail. The Protein Local Optima Walk (PLOW) algorithm combines a memetic approach from evolutionary computation with cutting-edge structure prediction protocols in computational biophysics. PLOW explores the space of local minima by explicitly projecting each move at the global level to a nearby local minimum. This allows PLOW to jump over local energy barriers and more effectively sample near-native conformations. An additional contribution of this thesis is that the memetic approach in PLOW is applied to FeLTr, a tree-based search framework which ensures geometric diversity of computed conformations through projections of the conformational space. Analysis across a broad range of proteins shows that PLOW and memetic FeLTr outperform the original FeLTr framework and compare favorably against state-of-the-art ab-initio structure prediction algorithms.
  • MS2: Kevin Molloy.

    Variable-Length Fragment Assembly within a Probabilistic Protein Structure Prediction Framework.

    George Mason University, June 2011.

    Commitee: Amarda Shehu (chair), Zoran Duric, and Jyh-Ming, Lien.

    @masterthesis{molloy2011:MSThesis, address = {Fairfax, Virginia}, author = {Molloy, K.}, school = {George Mason University}, title = {Variable-Length Fragment Assembly within a Probabilistic Protein Structure Prediction Framework}, year = {2011} }
    It is widely accepted that a protein’s biological function is highly correlated to the three-dimensional shape the protein assumes under physiological/native conditions. Predicting this three-dimensional structure, known as the native structure, from the protein’s amino-acid sequence is known as the protein structure prediction problem. This problem is regarded by many to be one of the grand challenges of computational biology. Fragment-based assembly is a widely used technique in ab-initio structure prediction methods that seek to predict structure from sequence. Essentially, a protein structure is pieced together with configurations of fragments extracted from databases of deposited protein native structures. Fragment length is an important consideration. The shorter the fragment, the more complex the protein conformational space where the native structure resides and the more rugged the energy surface associated with that space. The longer the fragment, the simpler the conformational space and the smoother the energy surface; hence, the higher the risk of missing important regions of space that may lead to the native structure. In this thesis, we explore the idea of varying the employed fragment lengths to alter the protein conformational space explored during a probabilistic search for the native structure. Varying fragment lengths allows for manipulating the dimensions of the search space during the process of sampling protein conformations. Essentially, longer fragments are used in early stages of the search to simplify the search space and smooth the energy surface. Shorter fragments are then utilized in later stages to provide visibility to the more complex and realistic conformational space. This approach is validated on four protein systems of diverse sizes and native topologies. The results show that employing variable-length fragments enhance the sampling of the conformational space for each protein, producing higher-quality native-like structures as compared to using a single fragment length. These promising results lay the foundations for exploring additional research directions in equipping a probabilistic search framework with the ability to make on-the-fly decisions and adaptively change the dimensionality of the conformational space it explores.
  • MS1: Amarda Shehu.

    Sampling Biomolecular Conformations with Spatial and Geometric Constraints.

    M.S. Thesis, Rice University, December 2004.

    Committee: Lydia E. Kavraki (chair), Cecilia Clementi, Ron Goldman, and Luay Nakhleh.

    @masterthesis{shehu2005:MSThesis, address = {Houston, Texas}, author = {Shehu, A.}, school = {Rice University}, title = {Sampling Biomolecular Conformations with Spatial and Energetic Constraints}, year = {2005} }
    This work extends cyclic coordinate descent to efficiently satisfy multiple spatial constraints, respect the secondary structure of proteins., and work with reduced backbone protein models. Reduced models allow us to treat large systems that are intractable under all-atom models. In addition, this thesis combines the satisfaction of multiple spatial constraints with conformational sampling and energy minimization techniques to generate spatially constrained biomolecular structures that are energetically stable under physiological conditions. The experiments in this thesis demonstrate the relevance and robustness of our method on three areas of applications: loop closure; backbone reconstruction, and physical trajectory recovery. Addressing the problem of loop closure, we obtain ensembles of spatially constrained conformations whose energy landscape is in agreement with laboratory experimental results on the energetic stability of the proteins at hand. Our experiments on backbone reconstruction agree with results from statistical approaches to this problem, but in addition guarantee the energetic feasibility of the completed models.