Publications

Refereed Journal Articles

  • Articles are listed in reverse chronological order. Impact factors (IFs) at the time of publication are provided. When not available, 5-year average or most recent IFs are provided. Links to publishers are provided for each article. Local copies are made available, under the warning that articles are provided under the copyright permission for noncommercial dissemination of academic work.
    Citation counts per paper below are slightly outdated. As of July 19, 2017, the total citation count per google scholar is 1114 (951 of which since 2012), h-index is 20, and i-10 index is 40.
  • Shehu's advisees indicated by: undergraduate (u), graduate (g), and postdoctoral (p) students.
    Corresponding authors are indicated by (*).
  • J40: Tatiana Maxmovap, Zijing Zhang, Daniel B Carr, Erion Plaku, and Amarda Shehu*.

    Sample-based Models of Protein Energy Landscapes and Slow Structural Rearrangements.

    J Comput Biol (JCB)

    2017 (in press).

  • J39: Emmanuel Sapinp, Kenneth De Jong*, and Amarda Shehu*.

    From Optimization to Mapping: An Evolutionary Algorithm for Protein Energy Landscapes.

    IEEE/ACM Trans Comp Biol and Bioinf (TCBB)

    2017, (doi: 10.1109/TCBB.2016.2628745, in press).

  • J38: Tatiana Maximovap, Erion Plaku*, and Amarda Shehu*.

    Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm.

    IEEE/ACM Trans Comp Biol and Bioinf (TCBB) 2017, (doi: 10.1109/TCBB.2016.2586044, in press).

    2016.

    @article{MaximovaShehuTCBB16, author = {Maximova, T. AND Plaku, E. AND Shehu, A.}, journal = {IEEE/ACM Trans Comput Biol and Bioinf (TCBB)}, title = {Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm}, year = 2017, note = {in press}, }
    Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff14SB force field are used to obtain energetically-credible paths at atomistic detail.
  • J37: Daniel Veltrig, Uday Kamath, and Amarda Shehu*.

    Improving Recognition of Antimicrobial Peptides and Target Selectivity through Machine Learning and Genetic Programming.

    IEEE/ACM Trans Comp Biol and Bioinf (TCBB), 14(2): 1545-5963,

    2017.

    @article{VeltriKamathShehuTCBB15, author = {Veltri, D. AND Kamath, U. AND Shehu, A.}, journal = {IEEE/ACM Trans Comput Biol and Bioinf}, title = {Improving Recognition of Antimicrobial Peptides and Target Selectivity through Machine Learning and Genetic Programming}, year = 2017, volume = {14}, number = {2}, pages = {300-313} }
    Growing bacterial resistance to antibiotics is spurring research on utilizing naturally-occurring antimicrobial peptides (AMPs) as templates for novel drug design. While experimentalists mainly focus on systematic point mutations to measure the effect on antibacterial activity, the computational community seeks to understand what determines such activity in a machine learning setting. The latter seeks to identify the biological signals or features that govern activity. In this paper, we advance research in this direction through a novel method that constructs and selects complex sequence-based features which capture information about distal patterns within a peptide. Comparative analysis with state-of-the-art methods in AMP recognition reveals our method is not only among the top performers, but it also provides transparent summarizations of antibacterial activity at the sequence level. Moreover, this paper demonstrates for the first time the capability not only to recognize that a peptide is an AMP or not but also to predict its target selectivity based on models of activity against only Gram-positive, only Gram-negative, or both types of bacteria. The work described in this paper is a step forward in computational research seeking to facilitate AMP design or modification in the wet laboratory.
  • J36: Amarda Shehu* and Erion Plaku*.

    A Survey of Computational Treatments of Biomolecules by Robotics-inspired Methods Modeling Equilibrium Structure and Dynamics.

    J Artif Intel Res (JAIR) 597:509-572,

    2016.

    @article{ShehuPlakuJAIR16, author = {Shehu, A. AND Plaku, E.} journal = {J Artif Intel Res}, title = {"A Survey of Computational Treatments of Biomolecules by Robotics-Inspired Methods Modeling Equilibrium Structure and Dynamics"}, year = 2016, volume = {57}, pages = {509-572} }
    More than fifty years of research in molecular biology have demonstrated that the ability of small and large molecules to interact with one another and propagate the cellular processes in the living cell lies in the ability of these molecules to assume and switch between specific structures under physiological conditions. Elucidating biomolecular structure and dynamics at equilibrium is therefore fundamental to furthering our understanding of biological function, molecular mechanisms in the cell, our own biology, disease, and disease treatments. By now, there is a wealth of methods designed to elucidate biomolecular structure and dynamics contributed from diverse scientific communities. In this survey, we focus on recent methods contributed from the Robotics community that promise to address outstanding challenges regarding the disparate length and time scales that characterize dynamic molecular processes in the cell. In particular, we survey robotics-inspired methods designed to obtain efficient representations of structure spaces of molecules in isolation or in assemblies for the purpose of characterizing equilibrium structure and dynamics. While an exhaustive review is an impossible endeavor, this survey balances the description of important algorithmic contributions with a critical discussion of outstanding computational challenges. The objective is to spur further research to address outstanding challenges in modeling equilibrium biomolecular structure and dynamics.
  • J35: Emmanuel Sapinp, Daniel B Carr, Kenneth A De Jong*, and Amarda Shehu*.

    Computing energy landscape maps and structural excursions of proteins.

    BMC Genomics 17(Suppl 4):546,

    2016.

    @article{SapinShehuBMCGeonmics16, author = {Sapin, E. AND Carr, D. AND {De Jong}, K. A. AND Shehu, A.} journal = {BMC Genomics}, title = {Computing energy landscape maps and structural excursions of proteins}, year = 2016, volume = {14}, number = {Suppl 4}, pages = {546} }
    Background
    Structural excursions of a protein at equilibrium are key to biomolecular recognition and function modulation. Protein modeling research is driven by the need to aid wet laboratories in characterizing equilibrium protein dynamics. In principle, structural excursions of a protein can be directly observed via simulation of its dynamics, but the disparate temporal scales involved in such excursions make this approach computationally impractical. On the other hand, an informative representation of the structure space available to a protein at equilibrium can be obtained efficiently via stochastic optimization, but this approach does not directly yield information on equilibrium dynamics.
    Methods
    We present here a novel methodology that first builds a multi-dimensional map of the energy landscape that underlies the structure space of a given protein and then queries the computed map for energetically-feasible excursions between structures of interest. An evolutionary algorithm builds such maps with a practical computational budget. Graphical techniques analyze a computed multi-dimensional map and expose interesting features of an energy landscape, such as basins and barriers. A path searching algorithm then queries a nearest-neighbor graph representation of a computed map for energetically-feasible basin-to-basin excursions.
    Results
    Evaluation is conducted on intrinsically-dynamic proteins of importance in human biology and disease. Visual statistical analysis of the maps of energy landscapes computed by the proposed methodology reveals features already captured in the wet laboratory, as well as new features indicative of interesting, unknown thermodynamically-stable and semi-stable regions of the equilibrium structure space. Comparison of maps and structural excursions computed by the proposed methodology on sequence variants of a protein sheds light on the role of equilibrium structure and dynamics in the sequence-function relationship.
    Conclusions
    Applications show that the proposed methodology is effective at locating basins in complex energy landscapes and computing basin-basin excursions of a protein with a practical computational budget. While the actual temporal scales spanned by a structural excursion cannot be directly obtained due to the foregoing of simulation of dynamics, hypotheses can be formulated regarding the impact of sequence mutations on protein function. These hypotheses are valuable in instigating further research in wet laboratories.
  • J34: Kevin Molloyg, Rudy Clauseng, and Amarda Shehu*.

    A Stochastic Roadmap Method to Model Protein Structural Transitions.

    Robotica 34(08):1705-1733,

    2016 (featured on issue cover).

    @article{MolloyShehuRobotica16, author = {Molloy, K. AND Shehu, A.}, journal = {Robotica}, title = {A stochastic roadmap method to model protein structural transitions}, year = 2015, volume = {34}, number = {08}, pages = {1705-1733} }
    Evidence is emerging that the role of protein structure in disease needs to be rethought. Sequence mutations in proteins are often found to affect the rate at which a protein switches between structures. Modeling structural transitions in wildtype and variant proteins is central to understanding the molecular basis of disease. This paper investigates an efficient algorithmic realization of the stochastic roadmap simulation framework to model structural transitions in wildtype and variants of proteins implicated in human disorders. Our results indicate that the algorithm is able to extract useful information on the impact of mutations on protein structure and function.
  • J33: Kevin Molloyg and Amarda Shehu*.

    A General, Adaptive, Roadmap-based Algorithm for Protein Motion Computation.

    IEEE Trans NanoBioScience (TNB) 15(2): 158-165,

    2016.

    @article{MolloyShehuTNB16, author = {Molloy, K. AND Shehu, A.}, journal = {IEEE Trans NanoBioScience}, number = {15}, pages = {158-165}, title = {A General, Adaptive, Roadmap-based Algorithm for Protein Motion Computation}, volume = {2}, year = 2016 }
    Precious information on protein function can be extracted from a detailed characterization of protein equilibrium dynamics. This remains elusive in wet and dry laboratories, as function-modulating transitions of a protein between functionally-relevant, thermodynamically-stable and meta-stable structural states often span disparate time scales. In this paper we propose a novel, robotics-inspired algorithm that circumvents time-scale challenges by drawing analogies between protein motion and robot motion. The algorithm adapts the popular roadmap-based framework in robot motion computation to handle the more complex protein conformation space and its underlying rugged energy surface. Given known structures representing stable and meta-stable states of a protein, the algorithm yields a time- and energy-prioritized list of transition paths between the structures, with each path represented as a series of conformations. The algorithm balances computational resources between a global search aimed at obtaining a global view of the network of protein conformations and their connectivity and a detailed local search focused on realizing such connections with physically-realistic models. Promising results are presented on a variety of proteins that demonstrate the general utility of the algorithm and its capability to improve the state of the art without employing system-specific insight.
  • J32: Tatiana Maximovap, Ryan Moffattg, Buyong Ma, Ruth Nussinov*, and Amarda Shehu*.

    Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics.

    PLoS Comp Biol 12(4): e1004619,

    2016, (top 50 most downloaded in 2016 and featured on April issue front cover. Also featured in the PLoS Comp Biol blog.)

    @article{MaximovaNussinovShehu15, author = {Maximova, T. AND Moffatt R. AND Ma, B. AND Nussinov, R. AND Shehu, A.}, journal = {PLoS Comput Biol}, title = {Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics}, year = 2015, volume = {12}, number = {4}, pages = {e1004619} }
    Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.
  • J31: Amarda Shehu* and Ruth Nussinov*.

    Computational Methods for Exploration and Analysis of Macromolecular Structure and Dynamics.

    PLoS Comput Biol (PCB)

    11(10): e1004585, 2015 (editorial).

    @article{ShehuNussinov15, author = {Shehu, A. AND Nussinov, R.}, journal = {PLoS Comput Biol}, title = {Computational Methods for Exploration and Analysis of Macromolecular Structure and Dynamics}, year = 2015, volume = {11}, number = {10}, pages = {e1004585} }
    All processes that maintain and replicate a living cell involve fluctuating biological macromolecules. As computational biologists, our aim is to discern the behavior of macromolecules in a way that experimental biology is not able to achieve. No single technique—experimental or computational—can capture all the relevant scales of cellular functional behavior. In principle, computations are the tools that can integrate different kinds of experimental and computational characterizations at different resolutions to obtain a more complete description of the processes of life. Computer simulations can act as a bridge between the microscopic length and time scales, and the macroscopic world of the laboratory. They can start from a macroscopic experiment-based guess of interactions between molecules, and obtain “exact” predictions of bulk and detailed properties subject to limitations. They are able to test a theory by constructing and simulating the model, and comparing the results with experimental measurements; and they are able to provide models that experiments can test. Computations can provide leads by processing large sets of data, predicting molecular behaviors, and supplying the mechanistic underpinning that experiments alone may not be able to achieve.
  • J30: Didier Devaurs, Kevin Molloy, Marc Vaisset, Amarda Shehu, Thierry Simeon, and Juan Cortes*.

    Characterizing Energy Landscapes of Peptides using a Combination of Stochastic Algorithms.

    IEEE Trans NanoBioScience (TNB), 14(5): 545-552,

    2015.

    @article{DevaursCortes15, author = {Devaurs, D. AND Molloy, K. AND Vaisset, M. AND Shehu, A. AND Simeon, T. AND Cortes, J.}, journal = {IEEE Trans NanoBioScience}, title = {Characterizing Energy Landscapes of Peptides Using a Combination of Stochastic Algorithms}, year = 2015, volume = {14}, number = {5}, pages = {545--552} }
    Obtaining accurate representations of energy landscapes of biomolecules such as proteins and peptides is central to structure-function studies. Peptides are particularly interesting, as they exploit structural flexibility to modulate their biological function. Despite their small size, peptide modeling remains challenging due to the complexity of the energy landscape of such highly-flexible dynamic systems. Currently, only stochastic sampling-based methods can efficiently explore the conformational space of a peptide. In this paper, we suggest to combine two such methods to obtain a full characterization of energy landscapes of small yet flexible peptides. First, we propose a simplified version of the classical Basin Hopping algorithm to reveal low-energy basins in the landscape, and thus to identify the corresponding meta-stable structural states of a peptide. Then, we present several variants of a robotics-inspired algorithm, the Transition-based Rapidly-exploring Random Tree, to quickly determine transition path ensembles, as well as transition probabilities between meta-stable states. We demonstrate this combined approach on met-enkephalin.
  • J29: Irina Hashmig and Amarda Shehu*.

    idDock+:Integrating Machine Learning in Probabilistic Search for Protein-protein Docking.

    J Computational Biology (JCB), 22(9):806-822,

    2015.

    @article{HashmiShehuJCB15, author = {Hashmi, I. AND Shehu, A.}, journal = {J Comput Biol}, title = {idDock+:Integrating Machine Learning in Probabilistic Search for Protein-protein Docking}, year = 2015, volume = {22}, number = {9}, pages = {806-822} }
    Predicting the three-dimensional native structures of protein dimers, a problem known as protein-protein docking, is key to understanding molecular interactions. Docking is a computationally-challenging problem due to the diversity of interactions and the high dimensionality of the configuration space. Existing methods draw configurations systematically or at random from the configuration space. The inaccuracy of scoring functions used to evaluate drawn configurations presents an additional challenges. Evidence is growing that optimization of a scoring function is an effective technique only once the drawn configuration is sufficiently similar to the native structure. Therefore, in this paper we present a method that employs optimization of a sophisticated energy function, FoldX, only to locally improve a promising configuration. The main question of how promising configurations are identified is addressed through a machine learning method trained a priori on an extensive dataset of functionally-diverse protein dimers. To deal with the vast configuration space, a probabilistic search algorithm operates on top of the learner, feeding to it configurations drawn at random. We refer to our method as idDock+ for informatics-driven Docking. idDock+ is tested on 15 dimers of different sizes and functional classes. Analysis shows that on all systems idDock+ finds a near-native structure and is comparable in accuracy to other state-of-the-art methods. idDock+ represents one of the first highly-efficient hybrid methods that combines fast machine learning models with demanding optimization of sophisticated energy scoring functions. Our results indicate that this is a promising direction to improve both efficiency and accuracy in docking.
  • J28: Rudy Clauseng and Amarda Shehu*.

    A Data-driven Evolutionary Algorithm for Mapping Multi-basin Protein Energy Landscapes.

    J Computational Biology (JCB), 22(9): 844-860,

    2015.

    @article{ClausenShehuJCB15, author = {Clausen, R. AND Shehu, A.}, journal = {J Comput Biol}, title = {A Data-driven Evolutionary Algorithm for Mapping Multi-basin Protein Energy Landscapes}, year = 2015, volume = {22}, number = {9}, pages = {844-860} }
    Evidence is emerging that many proteins involved in proteinopathies are dynamic molecules switching between stable and semi-stable structures to modulate their function. A detailed understanding of the relationship between structure and and function in such molecules demands a comprehensive characterization of their conformation space. Currently, only stochastic optimization methods are capable of exploring conformation spaces to obtain sample-based representations of associated energy surfaces. These methods have to address the fundamental but challenging issue of balancing computational resources between exploration (obtaining a broad view of the space) and exploitation (going deep in the energy surface). We propose a novel algorithm that strikes an effective balance by employing concepts from evolutionary computation. The algorithm leverages deposited crystal structures of wildtype and variant sequences of a protein to define a reduced, low-dimensional search space from where to rapidly draw samples. A multiscale technique maps samples to local minima of the all-atom energy surface of a protein under investigation. Several novel algorithmic strategies are employed to avoid premature convergence to particular minima and obtain a broad view of a possibly multi-basin energy surface. Analysis of applications on different proteins demonstrates the broad utility of the algorithm to map multi-basin energy landscapes and advance modeling of multi-basin proteins. In particular, applications on wildtype and variant sequences of proteins involved in proteinopathies demonstrate that the algorithm makes an important first step towards understanding the impact of sequence mutations on misfunction by providing the energy landscape as the intermediate explanatory link between protein sequence and function.
  • J27: Rudy Clauseng, Buyong Ma, Ruth Nussinov, and Amarda Shehu*.

    Mapping the Conformation Space of Wildtype and Mutant H-Ras with a Memetic, Cellular, and Multiscale Evolutionary Algorithm.

    PLoS Computational Biology (PCB)

    11(9): e1004470, 2015.

    @article{ClausenShehuPLoSCB15, author = {Clausen, R. AND Ma, B. AND Nussinov, R. AND Shehu, A.}, journal = {PLoS Comput Biol}, title = {Mapping the Conformation Space of Wildtype and Mutant H-Ras with a Memetic, Cellular, and Multiscale Evolutionary Algorithm}, year = 2015, volume = 11, number = 9, pages = {e1004470} }
    An important goal in molecular biology is to understand functional changes upon single-point mutations in proteins. Doing so through a detailed characterization of structure spaces and underlying energy landscapes is desirable but continues to challenge methods based on Molecular Dynamics. In this paper we propose a novel algorithm, SIfTER, which is based instead on stochastic optimization to circumvent the computational challenge of exploring the breadth of a protein’s structure space. SIfTER is a data-driven evolutionary algorithm, leveraging experimentally-available structures of wildtype and variant sequences of a protein to define a reduced search space from where to efficiently draw samples corresponding to novel structures not directly observed in the wet laboratory. The main advantage of SIfTER is its ability to rapidly generate conformational ensembles, thus allowing mapping and juxtaposing landscapes of variant sequences and relating observed differences to functional changes. We apply SIfTER to variant sequences of the H-Ras catalytic domain, due to the prominent role of the Ras protein in signaling pathways that control cell proliferation, its well-studied conformational switching, and abundance of documented mutations in several human tumors. Many Ras mutations are oncogenic, but detailed energy landscapes have not been reported until now. Analysis of SIfTER-computed energy landscapes for the wildtype and two oncogenic variants, G12V and Q61L, suggests that these mutations cause constitutive activation through two different mechanisms. G12V directly affects binding specificity while leaving the energy landscape largely unchanged, whereas Q61L has pronounced, starker effects on the landscape. An implementation of SIfTER is made available at http://www.cs.gmu.edu/~ashehu/?q=OurTools. We believe SIfTER is useful to the community to answer the question of how sequence mutations affect the function of a protein, when there is an abundance of experimental structures that can be exploited to reconstruct an energy landscape that would be computationally impractical to do via Molecular Dynamics.
  • J26: Uday Kamathg, Kenneth A De Jong*, and Amarda Shehu*.

    Effective Automated Feature Construction and Selection for Classification of Biological Sequences.

    PLoS One, 9(7): e99982,

    2014.

    @article{KamathDeJongShehuPLoS14, author = {Kamath, U. AND {De Jong}, K. A. AND Shehu, A.}, journal = {PLoS {ONE}}, title = {Effective Automated Feature Construction and Selection for Classification of Biological Sequences}, year = 2014, volume = 9, number = 7, pages = {e99982} }
    Background: Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features. Methodology: We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not. Results: To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification of a specific signal. Code, documentation, and all data for the applications presented here are provided for the community at http://www.cs.gmu.edu/~ashehu/?q=OurTools.
  • J25: Kevin Molloyg, M. Jennifer Vanu, Daniel Barbara*, and Amarda Shehu*.

    Exploring Representations of Protein Structure for Automated Remote Homology Detection and Mapping of Protein Structure Space.

    BMC Bioinformatics 15 (Suppl 8):S4,

    2014.

    @article{MolloyBarbaraShehuBMCBioinf14, author = {Molloy, K. AND Min, J. V. AND Barbara, D. AND Shehu, A.}, journal = {BMC Bioinf}, title = {Exploring Representations of Protein Structure for Automated Remote Homology Detection and Mapping of Protein Structure Space}, volume = 15, number = {Suppl 8}, pages = {S4}, year = 2014 }
    Background: Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducable maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods: Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid analysis of topics in the protein structure domain. Results: We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions: This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organize and mine protein structure space with mature text mining tools.
  • J24: Nadine Kabbani*, Jacob C. Nordman, Brian Corgiat, Daniel Veltrig, Amarda Shehu, and David J. Adams.

    Are Nicotinic Receptors Coupled to G Proteins?

    BioEssays 35(12): 1025–1034,

    2013, (selected for journal front cover video display. Read the highlight written on our article in same issue by Edward Howrot.)

    @article{KabbaniShehuAdams, author = {Kabbani, N. AND Nordman, J. C. AND Corgiat, B. AND Veltri, D. AND Shehu, A. AND Adams, D. J.}, journal = {BioEssays}, volume = {35}, title = {Are nicotinic receptors coupled to G Proteins?}, number = {12}, pages = {1025-1034}, year = 2013 }
    Nicotinic acetylcholine receptors (nAChRs) constitute an important class of ligand-gated ion channels (LGICs) widely expressed in various organisms. This family of cation conducting channels can rapidly regulate cellular excitability upon ligand activation and contribute to longer lasting intracellular changes. Here we present evidence on direct interactions between nAChRs and heterotrimeric GTP binding proteins (G proteins) in cells. Based on proteomic, biophysical, and functional evidence, we hypothesize that coupling to G proteins modulates the activity and signaling of various nAChRs in cells. It is important to note that while this hypothesis is new for the nAChR, it is consistent with known interactions between G proteins and structurally related LGICs and thus underscores an evolutionarily conserved mechanism of LGIC interaction with G proteins. Coupling to G proteins represents an important metabotropic property of the nAChR in the cell.
  • J23: Abrar Ashoor, Jacob C. Nordman, Daniel Veltrig, Keun-Hang Susan Yang, Lina Al Kury, Yaroslav Shuba, Mohamed Mahgoub, Frank C. Howarth, Carl Lupica, Amarda Shehu, Nadine Kabbani, and Murat Oz*.

    Menthol Inhibits 5-HT3 Receptor-mediated Currents.

    J of Pharmacology and Experimental Therapeutics (JPET) 347(2):398-409,

    2013, (selected for issue front cover).

    @article{MuratJPET13, author = {Ashoor, A. AND Nordman, J. C. AND Veltri, D. AND Yang, K.-H. S. AND {Al Kury}, L. AND Shuba, Y. AND Mahgoub, M. AND Howarth, F. C. AND Lupica, C. AND Shehu, A. AND Kabbani, N. AND Oz, M.}, journal = {J of Pharmacology and Experimental Therapeutics (JPET)}, volume = {347}, title = {Menthol Inhibits 5-HT3 Receptor-mediated Currents}, number = {2}, pages = {398-409}, year = 2013 }
    The effects of alcohol monoterpene menthol, a major active ingredient of the peppermint plant was tested on the function of human hydroxytryptamine type 3 (5-HT3) receptors expressed in Xenopus oocytes. 5-HT (1 μM)-evoked currents recorded by two-electrode voltage clamp technique were reversibly inhibited by menthol in a concentration-dependent (IC50=163 μM) manner. The effects of menthol developed gradually, reaching a steady-state level within 10-15 min, and did not involve G-proteins, since GTP-γ-S activity remained unaltered and the effect of menthol was not sensitive to pertussis toxin pretreatment. The actions of menthol were not stereoselective since (-), (+), and racemic menthol inhibited 5-HT3 receptor mediated currents to the same extent. Menthol inhibition was not altered by intracellular BAPTA injections and trans-membrane potential changes. The maximum inhibition observed for menthol was not reversed by increasing concentrations of 5-HT. Furthermore, specific binding of 5-HT3 antagonist [3H]GR65630 was not altered in the presence of menthol (up to 1 mM), indicating that menthol acts as a noncompetitive antagonist of 5-HT3 receptor. Finally, 5-HT3 receptor mediated currents in acutely dissociated nodose ganglion neurons were also inhibited by menthol (100 μM). These data demonstrate that menthol, at pharmacologically relevant concentrations, is an allosteric inhibitor of 5-HT3 receptors.
  • J22: Abrar Ashoor, Jacob C. Nordman, Daniel Veltrig, Keun-Hang Susan Yang, Lina Al Kury, Yaroslav Shuba, Mohamed Mahgoub, Frank C. Howarth, Bassem Sadek, Amarda Shehu, Nadine Kabbani, and Murat Oz*.

    Menthol Binding and Inhibition of Alpha7-nicotinic Acetylcholine Receptors.

    PLos One 8(7):e67674,

    2013.

    @article{MuratPLOSONE13, author = {Ashoor, A. AND Nordman, J. C. AND Veltri, D. AND Yang, K.-H. S. AND {Al Kury}, L. AND Shuba, Y. AND Mahgoub, M. AND Howarth, F. C. AND Sadek, B. AND Shehu, A. AND Kabbani, N. AND Oz, M.}, journal = {{PLoS} One}, volume = {8}, number = {7}, title = {Menthol Binding and Inhibition of Alpha7-nicotinic Acetylcholine Receptors}, pages = {e67674}, year = 2013 }
    Menthol is a common compound in pharmaceutical and commercial products and a popular additive to cigarettes. The molecular targets of menthol remain poorly defined. In this study we show an effect of menthol on the α7 subunit of the nicotinic acetylcholine (nACh) receptor function. Using a two-electrode voltage-clamp technique, menthol was found to reversibly inhibit ACh induced α7 currents with in Xenopus oocytes. Inhibition by menthol was not dependent on the membrane potential and did not involve endogenous Ca2+-dependent Cl- channels, since menthol inhibition remained unchanged by intracellular injection of the Ca2+ chelator BAPTA and perfusion with Ca2+-free bathing solution containing Ba2+. Furthermore, increasing ACh concentrations did not reverse menthol inhibition and the specific binding of [125I] α-bungarotoxin was not attenuated by menthol. Studies of α7- nACh receptors endogenously expressed in neural cells demonstrate that menthol attenuates α7 mediated Ca2+ activity in cell body and neuritis of neural cells. Our results suggest that menthol directly inhibits α7-nACh receptors via a direct binding to the receptor channel.
  • J21: Kevin Molloyg, Sameh Salehu, and Amarda Shehu*.

    Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab-initio Protein Structure Prediction.

    IEEE/ACM Trans Comp Biol and Bioinf

    10(5):1162-1175, 2013.

    @article{MolloyShehuTCBB13, author = {Molloy, K. AND Saleh, S. AND Shehu, A.}, journal = {IEEE/ACM Trans Bioinf and Comp Biol}, volume = {10}, title = {Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab-initio Protein Structure Prediction}, number = {5}, pages = {1162-1175}, year = 2013 }
    Adequate sampling of the conformational space is a central challenge in ab-initio protein structure prediction. In the absence of a template structure, a conformational search procedure guided by an energy function explores the conformational space, gathering an ensemble of low-energy decoy conformations. If the sampling is inadequate, the native structure may be missed altogether. Even if reproduced, a subsequent stage that selects a subset of decoys for further structural detail and energetic refinement may discard near-native decoys if they are high-energy or insufficiently represented in the ensemble. Sampling should produce a decoy ensemble that facilitates the subsequent selection of near-native decoys. In this paper, we investigate a robotics-inspired framework that allows directly measuring the role of energy in guiding sampling. Testing demonstrates that a soft energy bias steers sampling towards a diverse decoy ensemble less prone to exploiting energetic artifacts and thus more likely to facilitate retainment of near-native conformations by selection techniques. We employ two different energy functions, the Associative Memory Hamiltonian with Water (AMW) and Rosetta. Results show that enhanced sampling provides a rigorous testing of energy functions and exposes different deficiencies in them, thus promising to guide development of more accurate representations and energy functions.
  • J20: Irina Hashmig and Amarda Shehu*.

    HopDock: A Probabilistic Search Algorithm for Decoy Sampling in Protein-protein Docking.

    Proteome Sci

    11(Suppl1):S6, 2013.

    @article{HashmiShehuProteomeSci13, author = {Hashmi, I. AND Shehu, A.}, journal = {Proteome Sci}, volume = {11}, title = {HopDock: A Probabilistic Search Algorithm for Decoy Sampling in Protein-protein Docking}, number = {Suppl1}, pages = {S6}, year = 2013 }
    Background: Elucidating the three-dimensional structure of a higher-order molecular assembly formed by interacting molecular units, a problem commonly known as docking, is central to understanding the molecular basis of biological function in the living and diseased cell. Though protein assemblies are ubiquitous in the cell, it is currently challenging to predict the native structure of a protein assembly in silico. Methods: This work proposes a novel probabilistic search algorithm, HopDock, to efficiently search the interaction space of protein dimers. The goal is to obtain an ensemble of low-energy dimeric configurations, also known as decoys, that can be effectively used by ab-initio docking protocols. HopDock is based on the Basin Hopping (BH) framework and repeatedly follows up a structural perturbation of a dimeric configuration with an energy minimization to explicitly sample configurations that represent local minima of a chosen energy function. HopDock employs both geometry and evolutionary conservation analysis to narrow down the interaction search space of interest for the purpose of efficiently obtaining a diverse decoy ensemble. Results and conclusions: A detailed analysis and a comparative study on seventeen different dimers shows HopDock obtains a broad view of the energy surface near the native dimeric structure and samples many near-native configurations. The results show that HopDock has high sampling capability and can be employed to effectively obtain a large and diverse ensemble of decoy configurations that can then be further refined in greater structural detail in ab-initio docking protocols.
  • J19: Sameh Salehu, Brian Olsong, and Amarda Shehu*.

    A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction.

    BMC Structural Biology J

    13(Suppl1):S4, 2013.

    @article{SalehShehuBMCStructBiol13, author = {Saleh, S. AND Olson, B. AND Shehu, A.}, journal = {BMC Struct Biol}, volume = {13}, title = {A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction}, number = {Suppl1}, pages = {S4}, year = 2013 }
    Background: Elucidating the native structure of a protein molecule from its sequence of amino acids, a problem known as de novo structure prediction, is a long standing challenge in computational structural biology. Difficulties in silico arise due to the high dimensionality of the protein conformational space and the ruggedness of the associated energy surface. The issue of multiple minima is a particularly troublesome hallmark of energy surfaces probed with current energy functions. In contrast to the true energy surface, these surfaces are weakly-funneled and rich in comparably deep minima populated by non-native structures. For this reason, many algorithms seek to be inclusive and obtain a broad view of the low-energy regions through an ensemble of low-energy (decoy) conformations. Conformational diversity in this ensemble is key to increasing the likelihood that the native structure has been captured. Methods: We propose an evolutionary search approach to address the multiple-minima problem in decoy sampling for de novo structure prediction. Two population-based evolutionary search algorithms are presented that follow the basic approach of treating conformations as individuals in an evolving population. Coarse graining and molecular fragment replacement are used to efficiently obtain protein-like child conformations from parents. Potential energy is used both to bias parent selection and determine which subset of parents and children will be retained in the evolving population. The effect on the decoy ensemble of sampling minima directly is measured by additionally mapping a conformation to its nearest local minimum before considering it for retainment. The resulting memetic algorithm thus evolves not just a population of conformations but a population of local minima. Results and conclusions: Results show that both algorithms are effective in terms of sampling conformations in proximity of the known native structure. The additional minimization is shown to be key to enhancing sampling capability and obtaining a diverse ensemble of decoy conformations, circumventing premature convergence to sub-optimal regions in the conformational space, and approaching the native structure with proximity that is comparable to state-of-the-art decoy sampling methods. The results are shown to be robust and valid when using two representative state-of-the-art coarse-grained energy functions.
  • J18: Brian Olsong and Amarda Shehu*.

    Rapid Sampling of Local Minima in Protein Energy Surface and Effective Reduction through a Multi-objective Filter.

    Proteome Sci 11(Suppl1):S12

    2013.

    @article{OlsonShehuProteomSci13, author = {Olson, B. AND Shehu, A.}, journal = {Proteome Sci}, volume = {11}, title = {Rapid Sampling of Local Minima in Protein Energy Surface and Effective Reduction through a Multi-objective Filter}, number = {Suppl1}, pages = {S12}, year = 2013 }
    Background: Many problems in protein modeling demand obtaining a discrete representation of the protein conformational space in terms of an ensemble of conformations. In ab-initio structure prediction, in particular, where the goal is to predict the native structure of a protein chain given its amino-acid sequence, the ensemble needs to satisfy energetic constraints. Given the thermodynamic hypothesis, an effective ensemble contains low-energy conformations near the native structure. The high-dimensionality of the conformational space and the ruggedness of the underlying energy surface currently make it very difficult to obtain such an ensemble. Recent studies have proposed that Basin Hopping is a promising probabilistic search framework to obtain a discrete representation of the protein energy surface in terms of local minima. The framework, where a structural perturbation is followed by an energy minimization to hop between nearby minima in the energy surface, has been shown effective in obtaining conformations near the native structure for small systems. Recent work by us has extended this framework to larger systems through employment of the molecular fragment replacement technique, resulting in rapid sampling of large ensembles. Methods: Here we conduct a detailed investigation of the algorithmic components in Basin Hopping to both understand and control their effect on the sampling of near-native minima. Realizing that such an ensemble is reduced before further refinement in full ab-initio protocols, we take an additional step and analyze the quality of the ensemble retained by ensemble reduction techniques. We propose a novel multi-objective technique based on the Pareto front to filter the ensemble of sampled local minima. Results and conclusions: We show that controlling the magnitude of the perturbation allows directly controlling the distance between consecutively-sampled local minima and in turn steering the exploration towards conformations near the native structure. In minimization, we show that a simple greedy search is just as effective as Metropolis Monte Carlo-based minimization. Finally, we show that the multi-objective filter is particularly effective at efficiently reducing the ensemble of sampled local minima and obtains a simpler representation of the probed energy surface.
  • J17: Kevin Molloyg and Amarda Shehu*.

    Elucidating the Ensemble of Functionally-relevant Transitions in Protein Systems with a Robotics-inspired Method.

    BMC Structural Biology J

    13(Suppl1):S8, 2013.

    @article{MolloyShehuBMCStructBiol13, author = {Molloy, K. AND Shehu, A.}, journal = {BMC Struct Biol}, volume = {13}, title = {Elucidating the Ensemble of Functionally-relevant Transitions in Protein Systems with a Robotics-inspired Method}, number = {Suppl 1}, pages = {S8}, year = 2013 }
    Background: Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space. Methods: We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers. Results and conclusions: Experiments are conducted on small- to medium-size proteins of length up to $214$ amino acids and with multiple known functionally-relevant states, some of which are more than $13$\AA~ apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers.
  • J16: Brian Olsong, Irina Hashmig, Kevin Molloyg, and Amarda Shehu*.

    Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules.

    Advances in Artificial Intelligence J

    2012, 674832 (special issue on Artificial Intelligence Applications in Biomedicine).

    @article{OlsonShehuAdvAI12, author = {Olson, B. AND Hashmi, I. AND Molloy, K. AND Shehu, A.}, journal = {Advances in AI J}, number = {674832}, title = {Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules}, volume = {2012}, year = 2012 }
    Since its introduction, the Basin Hopping (BH) framework has proven useful for hard non-linear optimization problems with multiple variables and modalities. Applications span a wide range, from packing problems in geometry to characterization of molecular states in statistical physics. BH is seeing a re-emergence in computational structural biology due to its ability to obtain a coarse-grained representation of the protein energy surface in terms of local minima. In this paper, we show that the BH framework is general and versatile, allowing to address problems related to the characterization of protein structure, assembly, and motion due to its fundamental ability to sample minima in a high-dimensional variable space. We show how specific implementations of the main components in BH yield algorithmic realizations that attain state-of-the-art results in the context of ab-initio protein structure prediction and rigid protein-protein docking. We also show that BH can map intermediate minima related with motions connecting diverse stable functionally-relevant states in a protein molecule, thus serving as a first step towards the characterization of transition trajectories connecting these states.
  • J15: Brian Olsong and Amarda Shehu*.

    Evolutionary-inspired Probabilistic Search for Enhancing Sampling of Local Minima in the Protein Energy Surface.

    Proteome Science

    2012, 10(Suppl1): S5.

    @article{OlsonShehuProtSci12, author = {Olson, B. AND Shehu, A.}, journal = {Proteome Sci}, number = {10}, pages = {S5}, title = {Evolutionary-inspired probabilistic search for enhancing sampling of local minima in the protein energy surface}, volume = {10}, year = 2012 }
    Background: Despite computational challenges, elucidating conformations that a protein system assumes under physiologic conditions for the purpose of biological activity is a central problem in computational structural biology. While these conformations are associated with low energies in the energy surface that underlies the protein conformational space, few existing conformational search algorithms focus on explicitly sampling local low-energy minima in the protein energy surface. Methods: This work proposes a novel probabilistic search framework, PLOW, that explicitly samples local low-energy minima in the protein energy surface. The framework combines algorithmic ingredients from evolutionary computation and computational structural biology to effectively explore the subspace of local minima. A greedy local search maps a conformation sampled in conformational space to a nearby local minimum. A perturbation move allows jumping out of a local minimum and obtain a new starting conformation for the greedy local search. The process repeats in an iterative fashion, resulting in a trajectory-based exploration of the subspace of local minima. Results and Conclusions: The analysis of PLOW's performance shows that, by navigating only the subspace of local minima, PLOW is able to sample conformations near a protein's native structure, either more effectively or as well as state-of-the-art methods that focus on reproducing the native structure for a protein system. Analysis of the actual subspace of local minima shows that PLOW samples this subspace more effectively that a naive sampling approach. Additional theoretical analysis reveals that the perturbation function employed by PLOW is key to its ability to sample a diverse set of low-energy conformations. This analysis also suggests directions for further research and novel applications for the proposed framework.
  • J14: Irina Hashmig, Bahar Aklbal-Delibas, Nurit Haspel, and Amarda Shehu*.

    Guiding Protein Docking with Geometric and Evolutionary Information.

    J Bioinf and Comp Biol

    2012, 10(3): 1242002.

    @article{HashmiShehu12, author = {Hashmi, I. AND Akbal-Delibas, B. AND Haspel, N. AND Shehu, A.}, journal = {J Bioinf and Comp Biol}, number = {3}, pages = {1242002}, title = {Guiding Protein Docking with Geometric and Evolutionary Information}, volume = {10}, year = 2012 }
    Structural modeling of molecular assemblies promises to improve our understanding of molecular interactions and biological function. Even when focusing on modeling structures of protein dimers from knowledge of monomeric native structure, docking two rigid structures onto one another entails exploring a large con¯gurational space. This paper presents a novel approach for docking protein molecules and elucidating native-like configurations of protein dimers. The approach makes use of geometric hashing to focus the docking of monomeric units on geometrically complementary regions through rigid-body transformations. This geometry-based approach improves the feasibility of searching the combined con¯gurational space. The search space is narrowed even further by focusing the sought rigid-body transformations around molecular surface regions composed of amino acids with high evolutionary conservation. This condition is based on recent findings, where analysis of protein assemblies reveals that many functional interfaces are significantly conserved throughout evolution. Di®erent search procedures are employed in this work to search the resulting narrowed configurational space. A proof-of-concept energy-guided probabilistic search procedure is also presented. Results are shown on a broad list of 18 protein dimers and additionally compared with data reported by other labs. Our analysis shows that focusing the search around evolutionary-conserved interfaces results in lower lRMSDs.
  • J13: Bahar Aklbal-Delibas, Irina Hashmig, Amarda Shehu, and Nurit Haspel*.

    An Evolutionary Conservation Based Method for Re fining and Reranking Protein Complex Structures.

    J Bioinf and Comp Biol

    2012, 10(3):1242008.

    @article{AkbalHaspel12, author = {Akbal-Delibas, B. AND Hashmi, I. AND Shehu, A. AND Haspel, N.}, journal = {J Bioinf and Comp Biol}, number = {3}, pages = {1242008}, title = {An Evolutionary Conservation Based Method for Refining and Reranking Protein Complex Structures}, volume = {10}, year = 2012 }
    Detection of protein complexes and their structures is crucial for understanding their role in the basic biology of organisms. Computational docking methods can provide researchers with a good starting point for the analysis of protein complexes. However, these methods are often not accurate and their results need to be further refined to improve interface packing. In this paper, we introduce a refinement method that incorporates evolutionary information into a novel scoring function by employing Evolutionary Trace (ET)-based scores. Our method also takes Van der Waals interactions into account to avoid atomic clashes in refined structures. We tested our method on docked candidates of eight protein complexes and the results suggest that the proposed scoring function helps bias the search toward complexes with native interactions. We show a strong correlation between evolutionary-conserved residues and correct interface packing. Our refinement method is able to produce structures with better lRMSD (least RMSD) with respect to the known complexes and lower energies than initial docked structures. It also helps to filter out false-positive complexes generated by docking methods, by detecting little or no conserved residues on false interfaces. We believe this method is a step toward better ranking and prediction of protein complexes.
  • J12: Brian Olsong, Kevin Molloyg, S.-Farid Hendig, and Amarda Shehu*.

    Guiding Search in the Protein Conformational Space with Structural Profiles.

    J Bioinf and Comp Biol

    2012, 10(3):1242005.

    @article{OlsonMolloyShehu12, author = {Olson, B. S. AND Molloy, K. AND Hendi, S.-F. AND Shehu, A.}, journal = {J Bioinf and Comp Biol}, number = {3}, pages = {1242005}, title = {Guiding Search in the Protein Conformational Space with Structural Profiles}, volume = {10}, year = 2012 }
    The roughness of the protein energy surface poses a significant challenge to search algorithms that seek to obtain a structural characterization of the native state. Recent research seeks to bias search toward near-native conformations through one-dimensional structural profiles of the protein native state. Here we investigate the effectiveness of such profiles in a structure prediction setting for proteins of various sizes and folds. We pursue two directions. We first investigate the contribution of structural profiles in comparison to or in conjunction with physics-based energy functions in providing an effective energy bias. We conduct this investigation in the context of Metropolis Monte Carlo with fragment-based assembly. Second, we explore the effectiveness of structural profiles in providing projection coordinates through which to organize the conformational space. We do so in the context of a robotics-inspired search framework proposed in our lab that employs projections of the conformational space to guide search. Our findings indicate that structural profiles are most effective in obtaining physically realistic near-native conformations when employed in conjunction with physics-based energy functions. Our findings also show that these profiles are very effective when employed instead as projection coordinates to guide probabilistic search toward undersampled regions of the conformational space.
  • J11: Amarda Shehu* and Lydia Kavraki*.

    Modeling Structures and Motions of Loops in Protein Molecules.

    Entropy

    2012, 14(2):252-290 (invited review article), IF 2011: 1.109).

    @article{ShehuKavrakiEntropy12, author = {Shehu, A. AND Kavraki, L. E.}, journal = {Entropy J}, number = {2}, pages = {252-290}, title = {Modeling Structures and Motions of Loops in Protein Molecules}, volume = {14}, year = 2012 }
    Unlike the secondary structure elements that connect in protein structures, loop fragments in protein chains are often highly mobile even in generally stable proteins. The structural variability of loops is often at the center of a protein’s stability, folding, and even biological function. Loops are found to mediate important biological processes, such as signaling, protein-ligand binding, and protein-protein interactions. Modeling conformations of a loop under physiological conditions remains an open problem in computational biology. This article reviews computational research in loop modeling, highlighting progress and challenges. Important insight is obtained on potential directions for future research.
  • J10: Uday Kamathg, Jack Comptonu, Rezarta Islamaj Dogan, Kenneth A. De Jong*, and Amarda Shehu*.

    An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and its Application to DNA Splice-Site Prediction.

    IEEE Trans Comp Biol and Bioinf

    2012, 9(5):1387-1398 (IF 2011: 2.25).

    @article{KamathShehuTCBB12, author = {Kamath, U. AND Compton, J. AND Islamaj Dogan, R. AND De Jong, K. A. AND Shehu, A.}, journal = {IEEE Trans Comp Biol and Bioinf}, number = {5}, pages = {1387-1398}, title = {An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and its Application to DNA Splice-Site Prediction}, volume = {9}, year = 2012 }
    Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain experts or exhaustive feature enumeration techniques to generate a few features whose predictive power is then tested in the context of classification. This paper proposes an evolutionary algorithm to effectively explore a large feature space and generate predictive features from sequence data. The effectiveness of the algorithm is demonstrated on an important component of the gene-finding problem, DNA splice site prediction. This application is chosen due to the complexity of the features needed to obtain high classification accuracy and precision. Our results test the effectiveness of the obtained features in the context of classification by Support Vector Machines and show significant improvement in accuracy and precision over state-of-the-art approaches.
  • J9: Uday Kamathg, Amarda Shehu*, and Kenneth A. De Jong*.

    A Two-Stage Evolutionary Approach for Effective Classification of Hypersensitive DNA Sequences.

    J Bioinf and Comp Biol

    2011, 9(3): 399-413.

    @article{KamathShehuDeJongJBCB11, author = {Kamath, U. AND Shehu, A. AND De Jong, K.}, journal = {J. Bioinf. and Comp. Biol.}, title = {A Two-Stage Evolutionary Approach for Effective Classification of Hypersensitive DNA Sequences}, number = {3}, pages = {399-413}, volume = {9}, year = 2011 }
    Hypersensitive (HS) sites in genomic sequences are reliable markers of DNA regulatory regions that control gene expression. Annotation of regulatory regions is important in understanding phenotypical differences among cells and diseases linked to pathologies in protein expression. Several computational techniques are devoted to mapping out regulatory regions in DNA by initially identifying HS sequences. Statistical learning techniques like Support Vector Machines (SVM), for instance, are employed to classify DNA sequences as HS or non-HS. This paper proposes a method to automate the basic steps in designing an SVM that improves the accuracy of such classification. The method proceeds in two stages and makes use of evolutionary algorithms. An evolutionary algorithm first designs optimal sequence motifs to associate explicit discriminating feature vectors with input DNA sequences. A second evolutionary algorithm then designs SVM kernel functions and parameters that optimally separate the HS and non-HS classes. Results show that this two-stage method significantly improves SVM classification accuracy. The method promises to be generally useful in automating the analysis of biological sequences, and we post its source code on our website.
  • J8: Brian Olsong, Kevin Molloyg, and Amarda Shehu*.

    In Search of the Protein Native State with a Probabilistic Sampling Approach.

    J Bioinf and Comp Biol

    2011, 9(3):383-398.

    @article{OlsonMolloyShehuJBCB11, author = {Olson, B. AND Molloy, K. AND Shehu, A.}, journal = {J. Bioinf. and Comp. Biol.}, title = {In Search of the Protein Native State with a Probabilistic Sampling Approach}, number = {3}, pages = {383-398}, volume = {9}, year = 2011 }
    The three-dimensional structure of a protein is a key determinant of its biological function. Given the cost and time required to acquire this structure through experimental means, computational models are necessary to complement wet-lab efforts. Many computational techniques exist for navigating the high-dimensional protein conformational search space, which is explored for low-energy conformations that comprise a protein’s native states. This work proposes two strategies to enhance the sampling of conformations near the native state. An enhanced fragment library with greater structural diversity is used to expand the search space in the context of fragment-based assembly. To manage the increased complexity of the search space, only a representative subset of the sampled conformations is retained to further guide the search towards the native state. Our results make the case that these two strategies greatly enhance the sampling of the conformational space near the native state. A detailed comparative analysis shows that our approach performs as well as state-of-the-art ab initio structure prediction protocols.
  • J7: Amarda Shehu* and Brian Olsong.

    Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration.

    Intl J of Robot Res

    2010, 29(8):1106-1127.

    @article{ShehuOlsonIJRR10, author = {Shehu, A. AND Olson, B.}, journal = {Intl. J. Robot. Res.}, title = {Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration}, number = {8}, pages = {1106-1127}, volume = {29}, year = 2010 }
    This paper proposes a robotics-inspired method to enhance sampling of native-like conformations when employing only amino-acid sequence information for a protein at hand. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and the rugged energy surface of the protein conformational space. The contribution of this paper is a novel two-layered method to enhance the sampling of geometrically-distinct low-energy conformations at a coarse-grained level of detail. The method grows a tree in conformational space reconciling two goals: (i) guiding the tree towards lower energies and (ii) not oversampling geometrically-similar conformations. Discretizations of the energy surface and a low-dimensional projection space are employed to select more often for expansion low-energy conformations in under-explored regions of the conformational space. The tree is expanded with low-energy conformations through a Metropolis Monte Carlo framework that uses a move set of physical fragment configurations. Testing on sequences of eight small-to-medium structurally-diverse proteins shows that the method rapidly samples native-like conformations in a few hours on a single CPU. Analysis shows that computed conformations are good candidates for further detailed energetic refinements by larger studies in protein engineering and design.
  • J6: Joseph A. Hegler, Joachim Laetzer, Amarda Shehu, Cecilia Clementi, and Peter G. Wolynes*.

    Restriction vs. Guidance: Fragment Assembly and Associative Memory Hamiltonians for Protein Structure Prediction.

    Proc. Nat. Acad. Sci. USA

    2009, 106(36):15302-15307.

    @article{HeglerWolynesPNAS09, author = {Hegler, J. A. AND Laetzer, J. AND Shehu, A. AND Clementi, C. AND Wolynes, P. G.}, journal = {Proc. Nat. Acad. Sci. USA}, title = {Restriction vs. Guidance: Fragment Assembly and Associative Memory Hamiltonians for Protein Structure Prediction}, number = {36}, pages = {15302-15307}, volume = {106}, year = 2009, }
    Conformational restriction by fragment assembly and guidance in molecular dynamics are alternate conformational search strategies in protein structure prediction. We examine both approaches using a version of the associative memory Hamiltonian that incorporates the influence of water-mediated interactions (AMW). For short proteins (<70 residues), fragment assembly, while searching a restricted space, compares well to molecular dynamics and is often sufficient to fold such proteins to near-native conformations (4Å) via simulated annealing. Longer proteins encounter kinetic sampling limitations in fragment assembly not seen in molecular dynamics which generally samples more native-like conformations. We also present a fragment enriched version of the standard AMW energy function, AMW-FME, which incorporates the local sequence alignment derived fragment libraries from fragment assembly directly into the energy function. This energy function, in which fragment information acts as a guide not a restriction, is found by molecular dynamics to improve on both previous approaches.
  • J5: Amarda Shehu, Lydia E. Kavraki*, and Cecilia Clementi*.

    Multiscale Characterization of Protein Conformational Ensembles.

    Proteins: Structure, Function, and Bioinformatics,

    2009,76(4):837-851.

    @article{ShehuKavrakiClementiProteins09, author = {Shehu, A. AND Kavraki, L. E. AND Clementi, C.}, journal = {Proteins: Struct, Funct, and Bioinf}, title = {Multiscale Characterization of Protein Conformational Ensembles}, number = {4}, pages = {837-851}, volume = {76}, year = 2009, }
    We propose a multiscale exploration method to characterize the conformational space populated by a protein at equilibrium. The method efficiently obtains a large set of equilibrium conformations in two stages: first exploring the entire space at a coarse-grained level of detail, then narrowing a refined exploration to selected low-energy regions. The coarse-grained exploration periodically adds all-atom detail to selected conformations to ensure that the search leads to regions which maintain low energies in all-atom detail. The second stage reconstructs selected low-energy coarse-grained conformations in all-atom detail. A low-dimensional energy landscape associated with all-atom conformations allows focusing the exploration to energy minima and their conformational ensembles. The lowest energy ensembles are enriched with additional all-atom conformations through further multiscale exploration. The lowest energy ensembles obtained from the application of the method to three different proteins correctly capture the known functional states of the considered systems.
  • J4: Amarda Shehu, Lydia E. Kavraki, and Cecilia Clementi*.

    Unfolding the Fold of Cyclic Cysteine-rich Peptides.

    Protein Science,

    2008, 17(3):482-493.

    @article{ShehuKavrakiClementiProtSci08, author = {Shehu, A. AND Kavraki, L. E. AND Clementi, C.}, journal = {Protein Sci}, number = {3}, pages = {482-493}, title = {Unfolding the Fold of Cyclic Cysteine-rich Peptides}, volume = {17}, year = 2008 }
    We propose a method to extensively characterize the native state ensemble of cyclic cysteine-rich peptides. The method uses minimal information, namely, amino acid sequence and cyclization, as a topological feature that characterizes the native state. The method does not assume a specific disulfide bond pairing for cysteines and allows the possibility of unpaired cysteines. A detailed view of the conformational space relevant for the native state is obtained through a hierarchic multi-resolution exploration. A crucial feature of the exploration is a geometric approach that efficiently generates a large number of distinct cyclic conformations independently of one another. A spatial and energetic analysis of the generated conformations associates a free-energy landscape to the explored conformational space. Application to three long cyclic peptides of different folds shows that the conformational ensembles and cysteine arrangements associated with free energy minima are fully consistent with available experimental data. The results provide a detailed analysis of the native state features of cyclic peptides that can be further tested in experiment.
  • J3: Amarda Shehu, Cecilia Clementi, and Lydia E. Kavraki*.

    Sampling Conformation Space to Model Equilibrium Fluctuations in Proteins.

    Algorithmica,

    2007, 48(4):303-327.

    @article{ShehuClementiKavrakiAlgo07, author = {Shehu, A. AND Clementi, C. AND Kavraki, L. E.}, journal = {Algorithmica}, number = {4}, pages = {303-327}, title = {Sampling Conformation Space to Model Equilibrium Fluctuations in Proteins}, volume = {48}, year = 2007 }
    This paper proposes the Protein Ensemble Method (PEM) to model equilibrium fluctuations in proteins where fragments of the protein polypeptide chain can move independently of one another. PEM models global equilibrium fluctuations of a polypeptide chain by combining local fluctuations of consecutive overlapping fragments of the chain. Local fluctuations are computed by a probabilistic exploration that exploits analogies between proteins and robots. All generated conformations are subjected to energy minimization and then are weighted according to a Boltzmann distribution. Using the theory of statistical mechanics the Boltzmann-weighted fluctuations corresponding to each fragment are combined to obtain fluctuations for the entire protein. The agreement obtained between PEM-modeled fluctuations, wet-lab experiment and guided simulation measurements, indicates that PEM is able to reproduce with high accuracy protein equilibrium fluctuations that occur over a broad range of timescales.
  • J2: Amarda Shehu, Lydia E. Kavraki, and Cecilia Clementi*.

    On the Characterization of Protein Native State Ensembles.

    Biophysical Journal,

    2007, 92(5):1503-1511.

    @article{ShehuKavrakiClementiBiophysJ07, author = {Shehu, A. AND Kavraki, L. E. AND Clementi, C.}, journal = {BiophysJ}, number = {5}, pages = {1503-1511}, title = {On the Characterization of Protein Native State Ensembles}, volume = {92}, year = 2007 }
    Describing and understanding the biological function of a protein requires a detailed structural and thermodynamic description of the protein's native state ensemble. Obtaining such a description often involves characterizing equilibrium fluctuations that occur beyond the nanosecond timescale. Capturing such fluctuations remains nontrivial even for very long molecular dynamics and Monte Carlo simulations. We propose a novel multiscale computational method to exhaustively characterize, in atomistic detail, the protein conformations constituting the native state with no inherent timescale limitations. Applications of this method to proteins of various folds and sizes show that thermodynamic observables measured as averages over the native state ensembles obtained by the method agree remarkably well with nuclear magnetic resonance data that span multiple timescales. By characterizing equilibrium fluctuations at atomistic detail over a broad range of timescales, from picoseconds to milliseconds, our method offers to complement current simulation techniques and wet-lab experiments and can impact our understanding and description of the relationship between protein flexibility and function.
  • J1: Amarda Shehu, Cecilia Clementi*, and Lydia E. Kavraki*.

    Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations.

    Proteins: Structure, Function, and Bioinformatics

    2006, 65(1):164-179.

    @article{ShehuClementiKavrakiProt06, author = {Shehu, A. AND Clementi, C. AND Kavraki, L. E.}, journal = {Proteins: Struct, Funct, and Bioinf}, number = {1}, pages = {164-179}, title = {Modeling Protein Conformational Ensembles: {F}rom Missing Loops to Equilibrium Fluctuations}, volume = {65}, year = 2006 }
    Characterizing protein flexibility is an important goal for understanding the physical-chemical principles governing biological function. This paper presents a Fragment Ensemble Method to capture the mobility of a protein fragment such as a missing loop and its extension into a Protein Ensemble Method to characterize the mobility of an entire protein at equilibrium. The underlying approach in both methods is to combine a geometric exploration of conformational space with a statistical mechanics formulation to generate an ensemble of physical conformations on which thermodynamic quantities can be measured as ensemble averages. The Fragment Ensemble Method is validated by applying it to characterize loop mobility in both instances of strongly stable and disordered loop fragments. In each instance, fluctuations measured over generated ensembles are consistent with data from experiment and simulation. The Protein Ensemble Method captures the mobility of an entire protein by generating and combining ensembles of conformations for consecutive overlapping fragments defined over the protein sequence. This method is validated by applying it to characterize flexibility in ubiquitin and protein G. Thermodynamic quantities measured over the ensembles generated for both proteins are fully consistent with available experimental data. On these proteins, the method recovers nontrivial data such as order parameters, residual dipolar couplings, and scalar couplings. Results presented in this work suggest that the proposed methods can provide insight into the interplay between protein flexibility and function.