M.S. and Ph.D. Theses

Ph.D. Theses

  • PhD5: Daniel Veltri.

    A Computational and Statistical Framework for Screening Antimicrobial Peptides.

    Ph.D. Thesis, George Mason University, December, 2015

    Committee: Amarda Shehu (chair), Jeffrey Solka, Iosif Vaisman, and Benjamin Matthews.

    @phdthesis{veltri2015:PhDThesis, author = {Veltri, D.}, school = {George Mason University}, title = {A Computational and Statistical Framework for Screening Antimicrobial Peptides}, year = 2015 }
  • PhD4: Irina Hashmi.

    Probabilistic Approaches to Protein-Protein Docking.

    Ph.D. Thesis, George Mason University, July, 2015

    Committee: Amarda Shehu (chair), Kenneth De Jong, Daniel Barbara, Huzefa Rangwala, and Nadine Kabbani.

    @phdthesis{hashmi2015:PhDThesis, author = {Hashmi, I.}, school = {George Mason University}, title = {Probabilistic Approaches to Protein-Protein Docking}, year = 2015 }
    Characterizing the three-dimensional structures of protein-protein assemblies, a problem known as protein-protein docking, is central to understanding the physical and structural bases of molecular interactions in cellular processes. Doing so can also provide useful insights in structure-function studies and the design of elective drugs. Despite significant contributions from wet-laboratory techniques, the number of high-resolution structures of protein assemblies characterized in the wet laboratory cover only a small fraction of possible interactions. Research in dry laboratories is vibrant but challenged by the complexity of molecular interactions. Predominantly, methods based on stochastic optimization are employed to handle the size and complexity of the space of possible placements of units in an assembly. Despite significant work showing that knowledge of interaction interfaces can be valuable to guide docking methods, very few methods incorporate such information. Those that do are restricted to the setting of directly incorporating wet-lab macroscopic measurements,such as chemical shifts, which are hard to obtain on a variety of systems. Moreover, currently no stochastic optimization methods integrate machine learning models in their search for functionally-relevant structures. The contribution of this thesis is the proposal of hybrid probabilistic approaches that integrate domain-specific insight into powerful stochastic optimization algorithms for the pairwise protein docking problem. Various sources of domain-specific insight are integrated and tested for how they guide a docking algorithm towards the true, native structure. Specifically, these sources consist of information stored in the sequences of evolutionary related proteins regarding the location of possible interaction interfaces, qualitative information provided from wet-laboratory experts, and information provided from machine learning models trained on known interaction interfaces. On the latter, several such models are considered and integrated in a powerful algorithm that approaches stochastic optimization under the umbrella of evolutionary computation. Our work shows that hybrid approaches such as those proposed in this thesis provide a good balance between computational efficiency and accuracy in protein-protein docking. The work in this thesis incorporates powerful techniques and concepts from evolutionary computation, machine learning, and molecular biology. In addition to pointing towards several directions of promise in further improving pairwise docking, this thesis opens up several novel research directions, such as function-specific machine learning models, and the employment of evolutionary algorithms for the general, multimeric protein-protein docking problem where the number of units is arbitrary.
  • PhD3: Kevin Molloy.

    Probabilistic Algorithms for Modeling Protein Structure and Dynamics.

    Ph.D. Thesis, George Mason University, January, 2015

    Committee: Amarda Shehu (chair), Daniel Barbara, Estela Blaisten-Barojas, Jyh-Ming Lien.

    @phdthesis{molloy2015:PhDThesis, author = {Molloy, K.}, school = {George Mason University}, title = {Probabilistic Algorithms for Modeling Protein Structure and Dynamics}, year = 2015 }
    This thesis proposes novel probabilistic algorithms to address critical open problems in computational structural biology regarding the relationship between structure, dynamics, and function in protein molecules. The focus on protein modeling research is warranted due to the ubiquity and central role of proteins in life-critical processes in the living cell. A study of protein molecules is important for understanding our biology and health. Many disorders in the sick cell are proteinopathies, where a protein disrupts a chemical process, causing the cell to deviate from its intended biological activity. However, unlike other life-critical macromolecules, such as DNA and RNA, where significant information about activity can be extracted from knowledge of the ordering of the constitutive building blocks, proteins exhibit a more complex relationship between the order of building blocks, the structures arising from spatial arrangements of the building blocks in three-dimensional space, and the determination from such arrangements of biological activity. Since studies of proteins pose exceptional challenges in wet laboratories, the work presented in this thesis proposes powerful algorithms to complement wet-laboratory research on understanding the relationship between structure, dynamics, and function in protein molecules. Specifically, this thesis addresses three main problems that permeate protein modeling research. The first problem, known as “from-structure-to-function,” asks how to infer the function of a protein from knowledge of its active structure. The second problem, known as “from-sequence-to-structure,” relates to the open question of how to predict the biologically-active structure of a protein when provided information on the identities and order of constitutive building blocks. The third problem advances the current computational treatment of proteins to alleviate assumptions of their rigidity and instead model them as dynamic macromolecules switching between structures to tune their biological activity. The objective here is to model protein dynamics efficiently by computing the molecular motions employed in structural transitions among diverse functionally-relevant states of a protein. The algorithmic techniques employed in this thesis span machine learning, computational geometry, and stochastic optimization. In particular, we combine computational geometry and machine learning in a novel framework to infer the function of a protein from knowledge of its structure. In our treatment of the de novo structure prediction problem, we employ and investigate in detail an adaptive stochastic optimization framework capable of balancing between search breadth and depth in the exploration of a high-dimensional and nonlinear search space. We pursue such frameworks further and propose novel robotics-inspired probabilistic algorithms to model protein dynamics. In particular, in our treatment of structure and dynamics, we exploit analogies between protein modeling and the motion planning problem in robotics, which allow us to employ relevant concepts from motion planning algorithms and propose powerful algorithms capable of handling highly-constrained articulated systems with hundreds or thousands of continuous and discrete variables. This thesis advances protein modeling research by extending the size and complexity of systems that can be modeled, as well as the detail and accuracy with which relevant biological questions can be answered. For instance, algorithms proposed here to model structural transitions are now able to explain the impact of sequence mutations on protein function. Just as important, the algorithmic techniques proposed in this thesis are of general utility to other domains in computer science focusing on extending optimization algorithms for vast and nonlinear search spaces of complex systems.
  • PhD2: Brian Olson.

    Evolving Local Minima in the Protein Energy Surface.

    Ph.D. Thesis, George Mason University, July, 2013

    Committee: Amarda Shehu (chair), Kenneth A De Jong, Estela Blaisten-Barojas, Jana Kosecka, Jyh-Ming Lien.

    @phdthesis{olson2013:PhDThesis, author = {Olson, B.}, school = {George Mason University}, title = {Evolving Local Minima in the Protein Energy Surface}, year = 2013 }
    Proteins are the molecular tools of the living cell and the path to unraveling their function is through modeling and understanding their structure. Many diseases occur when a protein loses its intended function due to its inability to form the appropriate structure with which it binds to other molecules in the cell. A holistic approach to protein modeling would be to characterize all possible structural states accessible by a protein under native, physiologic conditions. However, this task is infeasible. The question then becomes, how can we model the subset of these structural states most relevant to the function or disfunction of a protein? This thesis proposes a novel computational framework to obtain an expansive view of the protein conformational space relevant for function while controlling computational cost. The framework complements experimental and high-resolution computational methods which limit their focus to a single region of the conformational space. The framework employs the knowledge that functionally-relevant conformations are those low in energy and the framework incorporates the latest understanding of protein structure and energy from biophysics. Specifically, this thesis proposes a novel stochastic search framework for exploring a diverse ensemble of conformations which capture low-energy basins in the protein energy surface. The proposed search framework employs a hybrid or memetic approach for explicit sampling of local minima in the protein energy surface. This hybrid search framework combines a global evolutionary search approach with a local search component to take advantage of the latest advances from the computational biology community. Specifically, the following questions are addressed to effectively model the protein conformational space: (1) How to balance limited computational resources between exploration of the conformational space in global search with exploitation of local minima in local search? The hybrid search framework combines a global evolutionary search to explore the breadth of the conformational space with a local search for efficiently exploiting local minima in the underlying energy surface. (2) How to sample new conformations at the global level? Two complementary approaches are investigated. One approach proposes an enhanced fragment selection method for sampling a new conformation based on an existing structure. The other approach employs a genetic algorithm to combine features from multiple existing structures to sample a new conformation. (3) How to employ energy to better discriminate between interesting conformations and noise in the conformational search space? A multi-objective decomposition of the energy function is employed to guide the search towards more biologically relevant, low-energy conformations by focusing on the energy terms with the most discriminatory power. Work in this thesis shows that, by combining advanced algorithmic components with the latest understanding of protein biophysics, the proposed search framework is able to more effectively model functionally-relevant conformational states. A direct comparison between the proposed framework and a state-of-the-art coarse-grained sampling algorithm shows that the enhanced sampling strategies lead to a more comprehensive picture of the underlying protein energy surface. By taking this more comprehensive view, the framework is able to capture the protein native state as well as or better than methods relying primarily on protein-specific sampling strategies.
  • PhD1: Amarda Shehu.

    Molecules in Motion: Computing Structural Flexibility.

    Ph.D. Thesis, Rice University, June, 2008.

    Committee: Lydia E. Kavraki (chair), Cecilia Clementi, Luay Nakhleh.

    @phdthesis{shehu2008:PhDThesis, author = {Shehu, A.}, school = {Rice University}, title = {Molecules in Motion: Computing Structural Flexibility}, year = 2008 }
    Growing databases of protein sequences in the post-genomic era call for computational methods to extract structure and function from a protein sequence. In flexible molecules like proteins, function cannot be reliably extracted from a few structures. The amino-acid chain assumes various spatial arrangements (conformations) to modulate biological function. Characterizing the flexibility of a protein under physiological (native) conditions remains an open problem in computational biology. This thesis addresses the problem of characterizing the native flexibility of a protein by computing conformations populated under native conditions. Such computation involves locating free-energy minima in a high-dimensional conformational space. The methods proposed in this thesis search for native conformations using systematically less information from experiment: first employing an experimental structure, then using only a closure constraint in cyclic cysteine-rich peptides, and finally employing only the amino-acid sequence of small- to medium-size proteins. A novel method is proposed to compute structural fluctuations of a protein around an experimental structure. The method combines a robotics-inspired exploration of the conformational space with a statistical mechanics formulation. Thermodynamic quantities measured over generated conformations reproduce experimental data of broad time scales on small (~100 amino acids) proteins with non-concerted motions. Capturing concerted motions motivates the development of the next methods. A second method is proposed that employs a closure constraint to generate native conformations of cyclic cysteine-rich peptides. The method first explores the entire conformational space, then explores in present energy minima until no lower-energy minima emerge. The method captures relevant features of the native state also observed in experiment for 20-30 amino-acid long peptides. A final method is proposed that implements a similar exploration but for longer proteins and employing only amino-acid sequence. In its first stage, the method explores the entire conformational space at a coarse-grained level of detail. A second stage focuses the exploration to low-energy regions in more detail. All-atom conformational ensembles are obtained for proteins that populate various functional states through large-scale concerted motions. These ensembles capture well the populated functional states of proteins up to 214 amino-acids long.

M.Sc. Theses

  • MS8: Snehal Sambare.

    Structure- and Energy-based Analysis of FGFR2 Kinase Mutations Revealing Differences in Cancer and Syndrome Mutations.

    M.S. Thesis, George Mason University, May 2019.

    Committee: Don Seto (chair), Amarda Shehu (thesis advisor), and Dmitri Klimov.

    @msthesis{sambare2019:MSThesis, author = {Sambare, S.}, school = {George Mason University}, title = {Structure- and Energy-based Analysis of FGFR2 Kinase Mutations Revealing Differences in Cancer and Syndrome Mutations}, year = 2019 }
  • MS7: David Morris.

    Snapshots and Springs: Analyzing and Reproducing the Motions of Molecules.

    M.S. Thesis, George Mason University, August 2017.

    Committee: Amarda Shehu (chair), Zoran Duric, and Kevin Molloy.

    @msthesis{morris2017:MSThesis, author = {Morris, D.}, school = {George Mason University}, title = {Snapshots and Springs: Analyzing and Reproducing the Motions of Molecules}, year = 2017 }
  • MS6: Amr Z. A. Majul.

    Comparative Molecular Dynamic Simulations of 2 Helical AMPs Found in Snakes ATRA-1 and ATRA-2.

    M.S. Thesis, George Mason University, July 2015.

    Committee: Barney Bishop (thesis director), Amarda Shehu (thesis advisor), and Paige Mikell.

    @msthesis{majul2015:MSThesis, author = {Majul, A. Z.}, school = {George Mason University}, title = {Comparative Molecular Dynamic Simulations of 2 H Helical AMPs Found in Snakes ATRA-1 and ATRA-2}, year = 2015 }
    This thesis proposes the use of Molecular Dynamic (MD) simulations to study the two synthetic cat-ionic antimicrobial peptides (CAMPs), ATRA-1 and ATRA-2. The peptides are based on a natural Anti-microbial peptide found in the elapid snake Naja Atra [de Latour et al., 2010]. Natural AMPs can potentially serve as templates for engineering novel antibiotics. With this goal in mind MD simulations provide a valuable resource to supplement existing experimental and database-aided prediction data. Speci cally the interaction of the two aforementioned peptides with a model lipid bilayer membrane is studied. The antimicrobial potencies between the peptides di ffer appreciably, yet their amino acid sequence diff er from each other at only 2 positions. This thesis proposes the use of MD to run simulations to extract qualitative and quantitative information on each peptide. Most other similar computational studies on peptides are done on a single type of peptide. Simulations comparing ATRA-1 and ATRA-2 are analysed, focusing on elucidating the mechanism of action, fi nding various physical and chemical parameters that di fferentiate the two peptides. The simulations employ all atomistic explicit models, with a realistic non-anchored membrane. In addition the feasibility of categorizing the eff ectiveness of proposed rationally designed novel peptides will be evaluated.
  • MS5: Daniel P. Veltri.

    Physicochemical Feature Selection for Cathelicidin Antimicrobial Peptides.

    M.S. Thesis, George Mason University, April 2013.

    Committee: Amarda Shehu (chair), Iosif Vaisman, Barney Bishop.

    @msthesis{veltri2013:MSThesis, author = {Veltri, D. P.}, school = {George Mason University}, title = {Physicochemical Feature Selection for Cathelicidin Antimicrobial Peptides}, year = 2013 }
    Due to recent attention on antimicrobial peptides (AMPs) as targets for antibacterial drug research, many machine learning methods are now turning their attention to AMP recognition. Approaches that rely on whole-peptide properties for recognition are challenged by the great sequence diversity among AMPs for e ective feature construction. This thesis proposes a novel and complementary method for feature construction which relies on an extensive list of position-based amino acid physicochemical properties. These features are shown e ective in the context of classi cation by support vector machine (SVM), both in comparison to related work in recognition of AMPs and in a novel study on the cathelicidin family. A detailed analysis and careful construction of a decoy dataset allows for the highlighting of antimicrobial activity-related features in cathelicidins. Special attention is also given to residue positions involved with enzymatic cleavage. The method presented in this thesis is a rst step towards understanding what confers to cathelicidins their activity at the physicochemical level and may prove useful for future AMP design e fforts.
  • MS4: Irina Hashmi.

    A Probabilistic Search Algorithms for Protein-Protein Docking.

    M.S. Thesis, George Mason University, November 2012.

    Committee: Amarda Shehu (chair), Kenneth A. De Jong, and Jyh-Ming Lien.

    @masterthesis{hashmi2012:MSThesis, address = {Fairfax, Virginia}, author = {Hashmi, I.}, school = {George Mason University}, title = {A Probabilistic Search Algorithm for Protein-Protein Docking}, year = {2012} }
    Computational methods able to assist or complement wet-laboratory experiments in structural characterization of molecular assemblies promise to provide detailed insight into molecular interactions, drug-design, and biological function in the living and diseased cell. Methods that predict three-dimensional structures of protein-protein assemblies are abundant in computational structural biology. However, challenges remain in accurately detecting the interacting interface between participating units in an assembly. For search algorithms, the task of predicting the biologically-active structure of an assembly poses particular challenges due to the high dimensionality of the search space where potentially relevant assembly configurations lie. The work presented in this thesis is a step towards developing a new set of computational techniques and algorithms for structural characterization of protein-protein assemblies. Specifically, the work here focuses on modeling the three-dimensional quaternary structure of a protein dimer, a complex formed by interactions between two participating protein chains. This problem is commonly known as protein-protein docking. This work addresses the problem of rigid protein-protein docking, where the given unbounded structures of the protein units about to dimerize are expected to be the same as the bounded ones after dimerization. In addition to techniques proposed to alleviate certain computational aspects related with finding the right docking interface in protein dimers, this thesis proposes a new probabilistic search algorithm that employs both geometry and energy to sample low-energy configurations of a protein dimer. Analysis of evolutionary conservation and a geometric treatment of the molecular surface are combined in order to identify potentially-relevant contact interfaces between the two units in the dimer. Docking is focused only on evolutionary- conserved geometrically-complementary regions between the units' molecular surfaces, resulting in a narrower search space of rigid-body motions matching only such regions. This treatment is the first contribution of this work. The second contribution is a probabilistic search algorithm that efficiently explores the space of rigid-body motions corresponding to local minima in an energy function capturing interactions in a dimeric configuration. The proposed algorithm is an adaptation of the Basin Hopping (BH) framework. The work presented in this thesis details implementation and careful analysis of the components that result in an effective BH algorithm for rigid protein-protein docking. Application on a diverse list of protein shows that the algorithm is able to recover the native dimeric configuration as well as produce other relevant minima near the native configuration of a given dimer. A detailed analysis is presented that shows the algorithm reproduces known properties of the BH framework in other contexts and application, most notably the relationship between adjacency between consecutively-obtained local minima and proximity to the known native dimeric configuration. Taken together, the results presented show that the algorithm can be employed as a first stage in a computational docking protocol to sample low-energy near-native dimeric configurations that can then be further refined and discriminated with more computationally-intensive optimization protocols.
  • Brian Olson.

    MS3: Local Minima Hopping along the Protein Energy Surface.

    M.S. Thesis, George Mason University, November 2011.

    Committee: Amarda Shehu (chair), Jana Kosecka, and Jyh-Ming, Lien.

    @masterthesis{olson2011:MSThesis, address = {Fairfax, Virginia}, author = {Olson, B.}, school = {George Mason University}, title = {Local Minima Hopping along the Protein Energy Surface}, year = {2011} }
    Modeling of protein molecules in silico for the purpose of elucidating the three-dimensional structure where the protein is biologically active employs the knowledge that the protein conformational space has an underlying funnel-like energy surface. The biologically-active structure, also referred to as the native structure, resides at the basin or global minimum of the energy surface. A common approach among computational methods that seek the protein native structure is to search for local minima in the energy surface, with the hope that one of the local minima corresponds to the global minimum. Typical stochastic search methods, however, fail to explicitly sample local minima. This thesis proposes a novel algorithm to directly sample local minima at a coarse-grained level of detail. The Protein Local Optima Walk (PLOW) algorithm combines a memetic approach from evolutionary computation with cutting-edge structure prediction protocols in computational biophysics. PLOW explores the space of local minima by explicitly projecting each move at the global level to a nearby local minimum. This allows PLOW to jump over local energy barriers and more effectively sample near-native conformations. An additional contribution of this thesis is that the memetic approach in PLOW is applied to FeLTr, a tree-based search framework which ensures geometric diversity of computed conformations through projections of the conformational space. Analysis across a broad range of proteins shows that PLOW and memetic FeLTr outperform the original FeLTr framework and compare favorably against state-of-the-art ab-initio structure prediction algorithms.
  • MS2: Kevin Molloy.

    Variable-Length Fragment Assembly within a Probabilistic Protein Structure Prediction Framework.

    George Mason University, June 2011.

    Commitee: Amarda Shehu (chair), Zoran Duric, and Jyh-Ming, Lien.

    @masterthesis{molloy2011:MSThesis, address = {Fairfax, Virginia}, author = {Molloy, K.}, school = {George Mason University}, title = {Variable-Length Fragment Assembly within a Probabilistic Protein Structure Prediction Framework}, year = {2011} }
    It is widely accepted that a protein’s biological function is highly correlated to the three-dimensional shape the protein assumes under physiological/native conditions. Predicting this three-dimensional structure, known as the native structure, from the protein’s amino-acid sequence is known as the protein structure prediction problem. This problem is regarded by many to be one of the grand challenges of computational biology. Fragment-based assembly is a widely used technique in ab-initio structure prediction methods that seek to predict structure from sequence. Essentially, a protein structure is pieced together with configurations of fragments extracted from databases of deposited protein native structures. Fragment length is an important consideration. The shorter the fragment, the more complex the protein conformational space where the native structure resides and the more rugged the energy surface associated with that space. The longer the fragment, the simpler the conformational space and the smoother the energy surface; hence, the higher the risk of missing important regions of space that may lead to the native structure. In this thesis, we explore the idea of varying the employed fragment lengths to alter the protein conformational space explored during a probabilistic search for the native structure. Varying fragment lengths allows for manipulating the dimensions of the search space during the process of sampling protein conformations. Essentially, longer fragments are used in early stages of the search to simplify the search space and smooth the energy surface. Shorter fragments are then utilized in later stages to provide visibility to the more complex and realistic conformational space. This approach is validated on four protein systems of diverse sizes and native topologies. The results show that employing variable-length fragments enhance the sampling of the conformational space for each protein, producing higher-quality native-like structures as compared to using a single fragment length. These promising results lay the foundations for exploring additional research directions in equipping a probabilistic search framework with the ability to make on-the-fly decisions and adaptively change the dimensionality of the conformational space it explores.
  • MS1: Amarda Shehu.

    Sampling Biomolecular Conformations with Spatial and Geometric Constraints.

    M.S. Thesis, Rice University, December 2004.

    Committee: Lydia E. Kavraki (chair), Cecilia Clementi, Ron Goldman, and Luay Nakhleh.

    @masterthesis{shehu2005:MSThesis, address = {Houston, Texas}, author = {Shehu, A.}, school = {Rice University}, title = {Sampling Biomolecular Conformations with Spatial and Energetic Constraints}, year = {2005} }
    This work extends cyclic coordinate descent to efficiently satisfy multiple spatial constraints, respect the secondary structure of proteins., and work with reduced backbone protein models. Reduced models allow us to treat large systems that are intractable under all-atom models. In addition, this thesis combines the satisfaction of multiple spatial constraints with conformational sampling and energy minimization techniques to generate spatially constrained biomolecular structures that are energetically stable under physiological conditions. The experiments in this thesis demonstrate the relevance and robustness of our method on three areas of applications: loop closure; backbone reconstruction, and physical trajectory recovery. Addressing the problem of loop closure, we obtain ensembles of spatially constrained conformations whose energy landscape is in agreement with laboratory experimental results on the energetic stability of the proteins at hand. Our experiments on backbone reconstruction agree with results from statistical approaches to this problem, but in addition guarantee the energetic feasibility of the completed models.