Book Chapters

  • Book chapters are listed in reverse chronological order. Links to publishers are provided. Local copies are also made available, under the warning that reproduction is provided under the copyright permission for noncommercial dissemination of academic work.
  • B6: Nasrin Akhter, Liban Hassan, Zahra Rajabi, Daniel Barbara, and Amarda Shehu.

    Learning Organizations of Protein Energy Landscapes: An Application on Decoy Selection in Template-Free Protein Structure Prediction.

    In Methods in Molecular Biology: Protein Supersecondary Structure (Springer), first edition, (Editor: Kister, A.),


    @incollection{ShehuBookChapter18, author = {Akhter, N. AND Hassan, L. AND Rajabi, Z. AND Barbara, D. AND Shehu, A.}, title = {Learning Organizations of Protein Energy Landscapes: An Application on Decoy Selection in Template-Free Protein Structure Prediction}, booktitle = {Methods in Molecular Biology: Protein Supersecondary Structure}, editor = {Kister, A.}, publisher = {Springer}, year = 2018 }
    The protein energy landscape, which lifts the protein structure space by associating energies with structures, has been useful in improving our understanding of the relationship between structure, dynamics, and function. Currently, however, it is challenging to automatically extract and utilize the underlying organization of an energy landscape to the link structural states it houses to biological activity. In this chapter, we first report on two computational approaches that extract such an organization, one that ignores energies and operates directly in the structure space, and another that operates on the energy landscape associated with the structure space. We then describe two complementary approaches, one based on unsupervised learning and another based on supervised learning. Both approaches utilize the extracted organization to address the problem of decoy selection in template-free protein structure prediction. The presented results make the case that learning organizations of protein energy landscapes advances our ability to link structures to biological activity.
  • B5: Uday Kamath, Carlotta Domeniconi, Amarda Shehu, and Kenneth De Jong.

    EML: A Scalable, Transparent Meta-Learning Paradigm for Big Data Applications.

    In Intelligent Systems Reference Library: Innovations in Big Data Mining and Embedded Knowledge (Springer), first edition, (Editor: Anna Esposito, Antonietta M. Esposito, and Lakhmi C. Jain),


    @incollection{DeJongBookChapter18, author = {Kamath, U. AND Domeniconi, C. AND Shehu, A. AND {De Jong} K. A.}, title = {{EML}: A Scalable, Transparent Meta-Learning Paradigm for Big Data Applications}, booktitle = {Intelligent Systems Reference Library: Innovations in Big Data Mining and Embedded Knowledge}, editor = {Esposito, A. AND Esposito, A. AND Jain, L. C.}, publisher = {Springer}, year = 2018 }
    The work presented in this chapter is motivated by two important challenges that arise when applying ML techniques to big data applications: the scalability of an ML technique as the training data increases significantly in size, and the transparency (understandability) of the induced models. To address these issues we describe and analyze a meta-learning paradigm, EML, that combines techniques from evolutionary computation and supervised learning to produce a powerful approach for inducing transparent models for big data ML applications.
  • B4: Amarda Shehu, Daniel Barbara, and K. Molloy.

    A Survey of Computational Methods for Protein Function Prediction.

    In Big Data Analytics in Genomics (Springer), first edition, (Editors: Wong, K. C.),


    @incollection{ShehuBookChapter16, author = {Shehu, A. AND Barbara, D. AND Molloy, K.}, title = {A Survey of Computational Methods for Protein Function Prediction}, booktitle = {Big Data Analytics in Genomics}, editor = {Wong, K. C.}, publisher = {Springer}, year = 2016 }
    Rapid advances in high-throughout genome sequencing technologies have resulted in millions of protein-encoding gene sequences with no functional characterization. Automated protein function annotation or prediction is a prime problem for computational methods to tackle in the post-genomic era of big molecular data. While recent community-driven experiments demonstrate that the accuracy of function prediction methods has significantly improved, challenges remain. The latter are related to the different sources of data exploited to predict function, as well as different choices in representing and integrating heterogeneous data. Current methods predict function from a protein's sequence, often in the context of evolutionary relationships, from a protein's three-dimensional structure or specific patterns in the structure, from neighbors in a protein-protein interaction network, from microarray data, or a combination of these different types of data. Here we review these methods and the state of protein function prediction, emphasizing recent algorithmic developments, remaining challenges, and prospects for future research.
  • B3: Amarda Shehu.

    A Review of Evolutionary Algorithms for Computing Functional Conformations of Protein Molecules.

    In Computer-Aided Drug Discovery (Springer Methods in Pharmacology and Toxicology Series), first edition, (Editors: Wei Zhang),


    @incollection{ShehuBookChapter15, author = {Shehu, A.}, title = {A Review of Evolutionary Algorithms for Computing Functional Conformations of Protein Molecules}, booktitle = {Computer-Aided Drug Discovery}, editor = {Zhang, W.}, publisher = {Springer Methods in Pharmacology and Toxicology Series}, year = 2015 }
    The ubiquitous presence of proteins in chemical pathways in the cell and their key role in many human disorders motivates a growing body of protein modeling studies aimed at unraveling the relationship between protein structure and function. The foundation of such studies is the realization that knowledge of the structures a protein accesses under physiological conditions is key to a detailed understanding of its biological function and the design of therapeutic compounds for the purpose of altering misfunction in aberrant variants of a protein. Dry laboratory investigations promise a holistic treatment of the relationship between protein sequence, structure, and function. Significant efforts are made in the dry laboratory to map protein conformation spaces and underlying energy landscapes of proteins. The majority of such efforts employ well-studied computational templates, such as Molecular Dynamics and Monte Carlo. The focus of this review is on a third emerging template, stochastic optimization under the umbrella of evolutionary computation. Algorithms based on such a template, also known as evolutionary algorithms, are showing promise in addressing fundamental computational challenges in protein structure modeling and are opening up new avenues in protein modeling research. This review summarizes evolutionary algorithms for novice readers, while highlighting recent developments that showcase current, state-of-the-art capabilities for experts.
  • B2: Amarda Shehu.

    Probabilistic Search and Optimization for Protein Energy Landscapes.

    In Handbook of Computational Molecular Biology (Chapman & Hall/CRC Computer & Information Science Series), second edition, (Editors: Srinivas Aluru and Mona Singh),


    @incollection{ShehuBookChapter13, author = {Shehu, A.}, title = {Probabilistic Search and Optimization for Protein Energy Landscapes}, booktitle = {Handbook of Computational Molecular Biology}, editor = {Aluru, S. AND Singh, A.}, publisher = {Chapman \& Hall/CRC Computer \& Information Science Series}, year = 2013 }
    Protein modeling research is becoming increasingly important to complement research in the wet laboratory in improving our understanding of proteins and determinants of their biological function in the healthy and diseased cell. Furthering our knowledge of proteins is central to molecular biology, as virtually all biological mechanisms in the living cell involve protein molecules. Proteins are central components of cellular organization and function. Moreover, many diseases involve misbehaving proteins. Neurodegenerative diseases, such as Alzheimer's, prion's, and Huntington's, are increasingly starting to be viewed as proteinopathies involving misfolded proteins unable to perform their normal biological activity [176, 122]. An even broader subset of human diseases, including cancer, are known as protein conformational or misfolding diseases and have at their source a peptide or a protein failing to adopt its native functional conformational state [189]...
  • B1: Amarda Shehu.

    Conformational Search for the Protein Native State.

    In Introduction to Protein Structure Prediction: Methods and Algorithms (eds H. Rangwala and G. Karypis), John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 0.1002/9780470882207.ch19,

    September, 2010.

    @incollection{ShehuBookChapter10, author = {Shehu, A.}, title = {Conformational Search for the Protein Native State}, booktitle = {Protein Structure Prediction: Method and Algorithms}, editor = {Rangwala, H. AND Karypis, G.}, address = {Fairfax, VA}, publisher = {Wiley Book Series on Bioinformatics}, chapter = {19}, year = 2010 }
    This chapter presents a survey of computational methods that obtain a structural description of the protein native state. This description is important to understand a protein's biological function. The chapter presents the problem of characterizing the native state in conformational detail in terms of the challenges that it raises in computation. Computing the conformations populated by a protein under native conditions is cast as a search problem. Methods such as Molecular Dynamics and Monte Carlo are treated first. Multiscaling, the combination of reduced and high complexity models of conformations, is briefly summarized as a powerful strategy to rapidly extract important features of the energy surface associated with the protein conformational space. Other strategies that narrow the search space through information obtained in the wet lab are also presented. The chapter then focuses on enhanced sampling strategies employed to compute native-like conformations when given only amino-acid sequence. Fragment-based assembly methods are analyzed for their success and what they are revealing about the physical process of folding. The chapter concludes with a discussion of future research directions in the computational quest for the protein native state.