Technical Reports

  • Rohan Pandith, Pranay Singhh, and Amarda Shehu*.

    Making Sense of Big Molecular Data: Dimensionality Reduction Techniques for Automated Mapping and Analysis of Molecular Structures.

    Technical Report

    Siemens High-school Competition, 2014 (semifinalists).

    @techreport{PanditSinghShehu14, author = {Pandit, R. AND Singh, P. AND Shehu, A.}, institution = {George Mason University}, title = {Making Sense of Big Molecular Data: Dimensionality Reduction Techniques for Automated Mapping and Analysis of Molecular Structures}, year = {2014} }
    Modeling and simulation are now well-established tools to characterize biological molecules central to the inner workings of a cell. Typically, molecular simulation packages, whether based on Newtonian mechanics to accurately albeit inefficiently simulate molecular dynamics, or heuristic-based stochastic optimization to efficiently albeit less accurately model structures and motions, generate large volumes of molecular data. Many computational techniques have been developed over the years to simplify and reduce such data. Predominantly, studies have focused on dimensionality reduction techniques capable of extracting reaction coordinates that succinctly describe the essential dynamics hidden in the myriad of molecular data. In this project, we pitch linear against non-linear dimensionality reduction techniques to investigate their ability to map and elucidate features of peptide and protein energy landscapes. Moreover, we investigate a novel application setting, de-novo protein structure prediction, where thousands of protein structures/decoys are computed from a given protein sequence. The difficulty in this setting is how to select which decoy should be proposed or predicted to be the true native structure. We demonstrate here that dimensionality reduction techniques based on diffusion maps are effective at addressing the open problem of decoy selection for the purpose of making blind predictions.
  • Daniel Veltrig and Amarda Shehu*.

    Elucidating Activity-related Physico-chemical Features in Antimicrobial Peptides.

    Technical Report

    GMU-CS-TR-2012-6, 2012.

    @techreport{VeltriShehu12, author = {Daniel Veltri AND Shehu, A.}, institution = {George Mason University}, number = {GMU-CS-TR-2012-6}, title = {Elucidating Activity-related Physico-chemical Features in Antimicrobial Peptides}, year = {2012} }
    The rise of drug-resistant bacteria has brought attention to antimicrobial peptides (AMPs) as targets for novel antibacterial drug research. Many machine learning methods aim to improve recognition of AMPs. Sequence-derived features are often employed in the context of supervised learning through Support Vector Machines (SVMs). This can be useful for expediently screening databases for AMP-like peptides. However, AMPs are characterized by great sequence diversity. Moreover, biological studies focusing on AMP modification and de novo design stand to benefit from computational methods capable of exposing underlying features important for activity at the amino-acid level position. We take the first steps in this direction by considering an extensive list of amino-acid physico-chemical features. We gradually narrow this list down to relevant features in the context of SVM classification. We focus on a specific AMP class, cathelicidins, due to the abundance of documented sequences, to improve their recognition over carefully-designed decoy sequences. Analysis of the features important for the classification reveals interesting physico-chemical properties to preserve when modifying or designing novel AMPs in the wet laboratory.
  • Christopher Milesg, Brian Olsong, and Amarda Shehu*.

    Geometry-based Computation of Symmetric Homo-oligomeric Protein Complexes.

    Technical Report

    GMU-CS-TR-2009-2, 2009.

    @techreport{MilesOlsonShehu09, author = {Miles, C. AND Olson, B. AND Shehu, A.}, institution = {George Mason University}, number = {GMU-CS-TR-2009-2}, title = {Geometry-based Computation of Symmetric Homo-oligomeric Protein Complexes}, year = {2009} }
    The need to engineer novel therapeutics and functional materials is driving the in-silico design of molecular complexes. This paper proposes a method to compute symmetric homo-oligomeric protein complexes when the structure of the replicated protein monomer is known and rigid. The relationship between the structure of a protein and its biological function brings the in-silico determination of protein structures associated with functional states to the forefront of computational biology. While protein complexes, arising from associations of protein monomers, are pervasive in every genome, determination of their structures remains challenging. Given the difficulty in computing structures of a protein monomer, computing arrangements of monomers in a complex is mainly limited to dimers. A growth in the number of protein complexes studied in wet labs is allowing classification of their structures. A recent database shows that most naturally-occurring protein complexes are symmetric homo-oligomers. The method presented here exploits this database to propose structures of symmetric homooligomers that can accommodate spatial replications of a given protein monomer. The method searches the database for documented structures of symmetric homo-oligomers where the replicated monomer has a geometrically-similar structure to that of the input protein monomer. The proposed method is a first step towards the in-silico design of novel protein complexes that upon further refinement and characterization can serve as molecular machines or fundamental units in therapeutics or functional materials.