Proteins: Structure, Function, Bioinformatics
Why does mutation of Gln61 in Ras by the nitro analog NGln maintain activity of Ras-GAP in hydrolysis of guanosine triphosphate?
Interpretation of the experiments showing that the Ras-GAP protein complex maintains activity in guanosine triphosphate (GTP) hydrolysis upon replacement of Glu61 in Ras with its unnatural nitro analog, NGln, is an important issue for understanding details of chemical transformations at the enzyme active site. By using molecular modeling we demonstrate that both glutamine and its nitro analog in the aci-nitro form participate in the reaction of GTP hydrolysis at the stages of proton transfer and formation of inorganic phosphate. The computed structures and the energy profiles for the complete pathway from the enzyme-substrate to enzyme-product complexes for the wild-type and mutated Ras suggest that the reaction mechanism is not affected by this mutation. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Protein loops are essential structural elements that influence not only function but also protein stability and folding rates. It was recently reported that shortening a loop in the AcP protein may increase its native state conformational entropy. This effect on the entropy of the folded state can be much larger than the lower entropic penalty of ordering a shorter loop upon folding, and can therefore result in a more pronounced stabilization than predicted by polymer model for loop closure entropy. In this study, which aims at generalizing the effect of loop length shortening on native state dynamics, we use all-atom molecular dynamics simulations to study how gradual shortening a very long or solvent-exposed loop region in four different proteins can affect their stability. For two proteins, AcP and Ubc7, we show an increase in native state entropy in addition to the known effect of the loop length on the unfolded state entropy. However, for two permutants of SH3 domain, shortening a loop results only with the expected change in the entropy of the unfolded state, which nicely reproduces the observed experimental stabilization. Here, we show that an increase in the native state entropy following loop shortening is not unique to the AcP protein, yet nor is it a general rule that applies to all proteins following the truncation of any loop. This modification of the loop length on the folded state and on the unfolded state may result with a greater effect on protein stability. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11
Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Proteins are essential elements of biological systems, and their function typically relies on their ability to successfully bind to specific partners. Recently, an emphasis of study into protein interactions has been on hot spots, or residues in the binding interface that make a significant contribution to the binding energetics. In this study, we investigate how conservation of hot spots can be used to guide docking prediction. We show that the use of evolutionary data combined with hot spot prediction highlights near-native structures across a range of benchmark examples. Our approach explores various strategies for using hot spots and evolutionary data to score protein complexes, using both absolute and chemical definitions of conservation along with refinements to these strategies that look at windowed conservation and filtering to ensure a minimum number of hot spots in each binding partner. Finally, structure-based models of orthologs were generated for comparison with sequence-based scoring. Using two data sets of 22 and 85 examples, a high rate of top 10 and top 1 predictions are observed, with up to 82% of examples returning a top 10 hit and 35% returning top 1 hit depending on the data set and strategy applied; upon inclusion of the native structure among the decoys, up to 55% of examples yielded a top 1 hit. The 20 common examples between data sets show that more carefully curated interolog data yields better predictions, particularly in achieving top 1 hits.Proteins 2015. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
The flexibility of HIV protease (HIVp) plays a critical role in enabling enzymatic activity and is required for substrate access to the active site. While the importance of flexibility in the flaps that cover the active site is well known, flexibility in other parts of the enzyme is also critical for function. One key region is a loop containing Thr 80, which forms the walls of the active site. Although not situated within the active site, amino acid Thr80 is absolutely conserved. The mutation T80N preserves the structure of the enzyme but catalytic activity is completely lost. To investigate the potential influence of the T80N mutation on HIVp flexibility, wide-angle X-ray scattering (WAXS) data was measured for a series of HIVp variants. Starting with a calculated WAXS pattern from a rigid atomic model, the modulations in the intensity distribution caused by structural fluctuations in the protein were predicted by simple analytic methods and compared with the experimental data. An analysis of T80N WAXS data shows that this variant is significantly more rigid than the WT across all length scales. The effects of this single point mutation extend throughout the protein, to alter the mobility of amino acids in the enzymatic core. These results support the contentions that significant protein flexibility extends throughout HIVp and is critical to catalytic function. Proteins 2015. © 2014 Wiley Periodicals, Inc.
Methods of model accuracy estimation can help selecting the best models from decoy sets: Assessment of model accuracy estimations in CASP11
The article presents assessment of the model accuracy estimation methods participating in CASP11. The results of the assessment are expected to be useful to both—developers of the methods and users who way too often are presented with structural models without annotations of accuracy. The main emphasis is placed on the ability of techniques to identify the best models from among several available. Bivariate descriptive statistics and ROC analysis are used to additionally assess the overall correctness of the predicted model accuracy scores, the correlation between the predicted and observed accuracy of models, the effectiveness in distinguishing between good and bad models, the ability to discriminate between reliable and unreliable regions in models, and the accuracy of the coordinate error self-estimates. A rigid-body measure (GDT_TS) and three local-structure-based scores (LDDT, CADaa, and SphereGrinder) are used as reference measures for evaluating methods' performance. Consensus methods, taking advantage of the availability of several models for the same target protein, perform well on the majority of tasks. Methods that predict accuracy on the basis of a single model perform comparably to consensus methods in picking the best models and in the estimation of how accurate is the local structure. More groups than in previous experiments submitted reasonable error estimates of their own models, most likely in response to a recommendation from CASP and the increasing demand from users. Proteins 2015. © 2015 Wiley Periodicals, Inc.
The Mutation-Minimization Method (MuMi) to study the local response of proteins to point mutations has been introduced here. The heat shock protein Hsp70 as the test system since it displays features that have been studied in great detail has been used here. It has many conserved residues, serves several different functions on each of its domains, and displays interdomain allostery. For the analysis of spatial arrangement of residues within the protein, the network properties of the wild type (WT) protein as well as its all single alanine residue mutants using MuMi has been investigated. The measures to express the amount of change from the WT structure upon mutation and compare these deviations to find potential critical sites have been proposed. The functional significance of the potential sites to the parameter that uncovers them has been mapped. It was found that sites directly involved in binding were sensitive to mutations and were characterized by large displacements. On the other hand, sites that steer large conformational changes typically had increased reachability upon alanine mutations occurring elsewhere in the protein. Finally, residues that control communication within and between domains reside on the largest number of paths connecting pairs of residues in the protein. Proteins 2015. © 2015 Wiley Periodicals, Inc.
For many membrane proteins, the determination of their topology remains a challenge for methods like X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. Electron paramagnetic resonance (EPR) spectroscopy has evolved as an alternative technique to study structure and dynamics of membrane proteins. The present study demonstrates the feasibility of membrane protein topology determination using limited EPR distance and accessibility measurements. The BCL::MP-Fold (BioChemical Library membrane protein fold) algorithm assembles secondary structure elements (SSEs) in the membrane using a Monte Carlo Metropolis (MCM) approach. Sampled models are evaluated using knowledge-based potential functions and agreement with the EPR data and a knowledge-based energy function. Twenty-nine membrane proteins of up to 696 residues are used to test the algorithm. The RMSD100 value of the most accurate model is better than 8 Å for 27, better than 6 Å for 22, and better than 4 Å for 15 of the 29 proteins, demonstrating the algorithms' ability to sample the native topology. The average enrichment could be improved from 1.3 to 2.5, showing the improved discrimination power by using EPR data.Proteins 2015. © 2015 Wiley Periodicals, Inc
Prediction of the substrate for nonribosomal peptide synthetase (NRPS) adenylation domains by virtual screening
Nonribosomal peptide synthetases (NRPSs) synthesize a diverse array of bioactive small peptides, many of which are used in medicine. There is considerable interest in predicting NRPS substrate specificity in order to facilitate investigation of the many “cryptic” NRPS genes that have not been linked to any known product. However, the current sequence similarity-based methods are unable to produce reliable predictions when there is a lack of prior specificity data, which is a particular problem for fungal NRPSs. We conducted virtual screening on the specificity-determining domain of NRPSs, the adenylation domain, and found that virtual screening using experimentally determined structures results in good enrichment of the cognate substrate. Our results indicate that the conformation of the adenylation domain and in particular the conformation of a key conserved aromatic residue is important in determining the success of the virtual screening. When homology models of NRPS adenylation domains of known specificity, rather than experimentally determined structures, were built and used for virtual screening, good enrichment of the cognate substrate was also achieved in many cases. However, the accuracy of the models was key to the reliability of the predictions and there was a large variation in the results when different models of the same domain were used. This virtual screening approach is promising and is able to produce enrichment of the cognate substrates in many cases, but improvements in building and assessing homology models are required before the approach can be reliably applied to these models. Proteins 2015. © 2015 Wiley Periodicals, Inc.
pKa Predictions for Proteins, RNAs and DNAs with the Gaussian Dielectric Function Using DelPhiPKa
We developed a Poisson-Boltzmann based approach to calculate the pKa values of protein ionizable residues (Glu, Asp, His, Lys and Arg), nucleotides of RNA and single stranded DNA. Two novel features were utilized: the dielectric properties of the macromolecules and water phase were modeled via the smooth Gaussian-based dielectric function in DelPhi and the corresponding electrostatic energies were calculated without defining the molecular surface. We tested the algorithm by calculating pKa values for more than 300 residues from 32 proteins from the PPD dataset and achieved an overall RMSD of 0.77. Particularly, the RMSD of 0.55 was achieved for surface residues, while the RMSD of 1.1 for buried residues. The approach was also found capable of capturing the large pKa shifts of various single point mutations in staphylococcal nuclease (SNase) from pKa-cooperative dataset, resulting in an overall RMSD of 1.6 for this set of pKa's. Investigations showed that predictions for most of buried mutant residues of SNase could be improved by using higher dielectric constant values. Furthermore, an option to generate different hydrogen positions also improves pKa predictions for buried carboxyl residues. Finally, the pKa calculations on two RNAs demonstrated the capability of this approach for other types of biomolecules. This article is protected by copyright. All rights reserved.
Amino acid alphabet reduction preserves fold information contained in contact interactions in proteins
To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20-letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long-range (contact) interactions among amino acids in natively-folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well-defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well-known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long-range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches—including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs—fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. This article is protected by copyright. All rights reserved.
Extension of a Protein Docking Algorithm to Membranes and Applications to Amyloid Precursor Protein Dimerization
Novel adjustments are introduced to the docking algorithm, DOCK/PIERR, for the purpose of predicting structures of transmembrane protein complexes. Incorporating knowledge about the membrane environment is shown to significantly improve docking accuracy. The extended version of DOCK/PIERR is shown to perform comparably to other leading docking packages. This membrane version of DOCK/PIERR is applied to the prediction of coiled-coil homodimer structures of the transmembrane region of the C-terminal peptide of amyloid precursor protein (C99). Results from MD simulation of the C99 homodimer in POPC bilayer and docking are compared. Docking results are found to capture key aspects of the homodimer ensemble, including the existence of three topologically distinct conformers. Furthermore, the extended version of DOCK/PIERR is successful in capturing the effects of solvation in membrane and micelle. Specifically, DOCK/PIERR reproduces essential differences in the homodimer ensembles simulated in POPC bilayer and DPC micelle, where configurational entropy and surface curvature effects bias the handedness and topology of the homodimer ensemble. This article is protected by copyright. All rights reserved.
We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Different combinations of atomic interactions predict protein-small molecule and protein-DNA/RNA affinities with similar accuracy
Interactions between proteins and other molecules play essential roles in all biological processes. Although it is widely held that a protein's ligand specificity is determined primarily by its three-dimensional structure, the general principles by which structure determines ligand binding remain poorly understood. Here we use statistical analyses of a large number of protein−ligand complexes with associated binding-affinity measurements to quantitatively characterize how combinations of atomic interactions contribute to ligand affinity. We find that there are significant differences in how atomic interactions determine ligand affinity for proteins that bind small chemical ligands, those that bind DNA/RNA and those that interact with other proteins. Although protein-small molecule and protein-DNA/RNA binding affinities can be accurately predicted from structural data, models predicting one type of interaction perform poorly on the others. Additionally, the particular combinations of atomic interactions required to predict binding affinity differed between small-molecule and DNA/RNA data sets, consistent with the conclusion that the structural bases determining ligand affinity differ among interaction types. In contrast to what we observed for small-molecule and DNA/RNA interactions, no statistical models were capable of predicting protein−protein affinity with >60% correlation. We demonstrate the potential usefulness of protein-DNA/RNA binding prediction as a possible tool for high-throughput virtual screening to guide laboratory investigations, suggesting that quantitative characterization of diverse molecular interactions may have practical applications as well as fundamentally advancing our understanding of how molecular structure translates into function. Proteins 2015. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Protein structure prediction using residue- and fragment-environment potentials in CASP11
An accurate scoring function that can select near-native structure models from a pool of alternative models is key for successful protein structure prediction. For the critical assessment of techniques for protein structure prediction (CASP) 11, we have built a protocol of protein structure prediction that has novel coarse-grained scoring functions for selecting decoys as the heart of its pipeline. The score named PRESCO (Protein Residue Environment SCOre) developed recently by our group evaluates the native-likeness of local structural environment of residues in a structure decoy considering positions and the depth of side-chains of spatially neighboring residues. We also introduced a helix interaction potential as an additional scoring function for selecting decoys. The best models selected by PRESCO and the helix interaction potential underwent structure refinement, which includes side-chain modeling and relaxation with a short molecular dynamics simulation. Our protocol was successful, achieving the top rank in the free modeling category with a significant margin of the accumulated Z-score to the subsequent groups when the top 1 models were considered. Proteins 2015. © 2015 Wiley Periodicals, Inc.
The determinants of bond angle variability in protein/peptide backbones: A comprehensive statistical/quantum mechanics analysis
The elucidation of the mutual influence between peptide bond geometry and local conformation has important implications for protein structure refinement, validation, and prediction. To gain insights into the structural determinants and the energetic contributions associated with protein/peptide backbone plasticity, we here report an extensive analysis of the variability of the peptide bond angles by combining statistical analyses of protein structures and quantum mechanics calculations on small model peptide systems. Our analyses demonstrate that all the backbone bond angles strongly depend on the peptide conformation and unveil the existence of regular trends as function of ψ and/or φ. The excellent agreement of the quantum mechanics calculations with the statistical surveys of protein structures validates the computational scheme here employed and demonstrates that the valence geometry of protein/peptide backbone is primarily dictated by local interactions. Notably, for the first time we show that the position of the Hα hydrogen atom, which is an important parameter in NMR structural studies, is also dependent on the local conformation. Most of the trends observed may be satisfactorily explained by invoking steric repulsive interactions; in some specific cases the valence bond variability is also influenced by hydrogen-bond like interactions. Moreover, we can provide a reliable estimate of the energies involved in the interplay between geometry and conformations. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Sucrose prevents protein fibrillation through compaction of the tertiary structure but hardly affects the secondary structure
Amyloid fibers, implicated in a wide range of diseases, are formed when proteins misfold and stick together in long rope-like structures. As a natural mechanism, osmolytes can be used to modulate protein aggregation pathways with no interference with other cellular functions. The osmolyte sucrose delays fibrillation of the ribosomal protein S6 leading to softer and less shaped-defined fibrils. The molecular mechanism used by sucrose to delay S6 fibrillation was studied based on the two-state unfolding kinetics of the secondary and tertiary structures. It was concluded that the delay in S6 fibrillation results from stabilization and compaction of the slightly expanded tertiary native structure formed under fibrillation conditions. Interestingly, this compaction extends to almost all S6 tertiary structure but hardly affects its secondary structure. The part of the S6 tertiary structure that suffered more compaction by sucrose is known to be the first part to unfold, indicating that the native S6 has entered the unfolding pathway under fibrillation conditions. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Coiled-coil length: Size does matter
Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints. This article is protected by copyright. All rights reserved.
The value of protein structure classification information—Surveying the scientific literature
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP–extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012–2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings. Proteins 2015. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Pressure-induced structural transition of mature HIV-1 Protease from a combined NMR/MD simulation approach
We investigate the pressure-induced structural changes in the mature human immunodeficiency virus type 1 protease dimer (HIV-1 PR), using residual dipolar coupling (RDC) measurements in a weakly oriented solution. 1DNH RDCs were measured under high-pressure conditions for an inhibitor-free PR and an inhibitor-bound complex, as well as for an inhibitor-free multidrug resistant protease bearing 20 mutations (PR20). While PR20 and the inhibitor-bound PR were little affected by pressure, inhibitor-free PR showed significant differences in the RDCs measured at 600 bar compared to 1 bar. The structural basis of such changes was investigated by MD simulations using the experimental RDC restraints, revealing substantial conformational perturbations, specifically a partial opening of the flaps and the penetration of water molecules into the hydrophobic core of the subunits at high-pressure. This study highlights the exquisite sensitivity of RDCs to pressure-induced conformational changes and illustrates how RDCs combined with MD simulations can be used to determine the structural properties of metastable intermediate states on the folding energy landscape. This article is protected by copyright. All rights reserved.