Proteins: Structure, Function, Bioinformatics
For the template-based modeling (TBM) of CASP11 targets, we have developed three new protein modeling protocols (nns for server prediction and LEE and LEER for human prediction) by improving upon our previous CASP protocols (CASP7 through CASP10). We applied the powerful global optimization method of conformational space annealing to three stages of optimization, including multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain remodeling. For more successful fold recognition, a new alignment method called CRFalign was developed. It can incorporate sensitive positional and environmental dependence in alignment scores as well as strong nonlinear correlations among various features. Modifications and adjustments were made to the form of the energy function and weight parameters pertaining to the chain building procedure. For the side-chain remodeling step, residue-type dependence was introduced to the cutoff value that determines the entry of a rotamer to the side-chain modeling library. The good performance of the nns server method is attributed to successful fold recognition, by combining methods including CRFalign, and the current modeling formulation, to incorporate accurate structural aspects collected from multiple templates, to realize a greatly improved model. The LEE protocol is identical to the nns one except that CASP11-released server models are used as templates. The success of LEE in utilizing CASP11 server models indicates that proper template screening and template clustering assisted by appropriate cluster ranking promises a new direction to enhance protein 3D modeling. This article is protected by copyright. All rights reserved.
We examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins. It is shown that there are well-defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common- almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined. This article is protected by copyright. All rights reserved.
Escherichia coli ClpB is a heat shock protein that belongs to the AAA+ protein superfamily. Studies have shown that ClpB and its homologue in yeast, Hsp104, can disrupt protein aggregates in vivo. It is thought that ClpB requires binding of nucleoside triphosphate to assemble into hexameric rings with protein binding activity. In addition, it is widely assumed that ClpB is uniformly hexameric in the presence of nucleotides. Here we report, in the absence of nucleotide, that increasing ClpB concentration leads to ClpB hexamer formation, decreasing NaCl concentration stabilizes ClpB hexamers, and the ClpB assembly reaction is best described by a monomer, dimer, tetramer equilibrium under the three salt concentration examined here. Further, we found that ClpB oligomers exhibit relatively fast dissociation on the time scale of sedimentation. We anticipate our studies on ClpB assembly to be a starting point to understand how ClpB assembly is linked to the binding and disaggregation of denatured proteins. This article is protected by copyright. All rights reserved.
The value of protein structure classification information—surveying the scientific literature
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP–extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings. This article is protected by copyright. All rights reserved.
Structure of Ctk3, a subunit of the RNA polymerase II CTD kinase complex, reveals a noncanonical CTD-interacting domain fold
CTDK-I is a yeast kinase complex that phosphorylates the C-terminal repeat domain (CTD) of RNA polymerase II (Pol II) to promote transcription elongation. CTDK-I contains the cyclin-dependent kinase Ctk1 (homologous to human CDK9/CDK12), the cyclin Ctk2 (human cyclin K), and the yeast-specific subunit Ctk3, which is required for CTDK-I stability and activity. Here we predict that Ctk3 consists of a N-terminal CTD-interacting domain (CID) and a C-terminal three-helix bundle domain. We determine the X-ray crystal structure of the N-terminal domain of the Ctk3 homologue Lsg1 from the fission yeast Schizosaccharomyces pombe at 2.0 Å resolution. The structure reveals eight helices arranged into a right-handed superhelical fold that resembles the CID domain present in transcription termination factors Pcf11, Nrd1, and Rtt103. Ctk3 however shows different surface properties and no binding to CTD peptides. Together with the known structure of Ctk1 and Ctk2 homologues, our results lead to a molecular framework for analyzing the structure and function of the CTDK-I complex. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Large oligomeric complex structures can be computationally assembled by efficiently combining docked interfaces
Macromolecular oligomeric assemblies are involved in many biochemical processes of living organisms. The benefits of such assemblies in crowded cellular environments include increased reaction rates, efficient feedback regulation, cooperativity and protective functions. However, an atom-level structural determination of large assemblies is challenging due to the size of the complex and the difference in binding affinities of the involved proteins. In this study, we propose a novel combinatorial greedy algorithm for assembling large oligomeric complexes from information on the approximate position of interaction interfaces of pairs of monomers in the complex. Prior information on complex symmetry is not required but rather the symmetry is inferred during assembly. We implement an efficient geometric score, the transformation match score, that bypasses the model ranking problems of state-of-the-art scoring functions by scoring the similarity between the inferred dimers of the same monomer simultaneously with different binding partners in a (sub)complex with a set of pregenerated docking poses. We compiled a diverse benchmark set of 308 homo and heteromeric complexes containing 6 to 60 monomers. To explore the applicability of the method, we considered 48 sets of parameters and selected those three sets of parameters, for which the algorithm can correctly reconstruct the maximum number, namely 252 complexes (81.8%) in, at least one of the respective three runs. The crossvalidation coverage, that is, the mean fraction of correctly reconstructed benchmark complexes during crossvalidation, was 78.1%, which demonstrates the ability of the presented method to correctly reconstruct topology of a large variety of biological complexes. Proteins 2015. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Despite significant successes in structure-based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest-energy structures and sequences are found. DEE/A*-based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap-free list of low-energy protein conformations, which is necessary for ensemble-based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*-based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Chemokines form a family of signaling proteins mainly responsible for directing the traffic of leukocytes, where their biological activity can be modulated by their oligomerization state. We characterize the dynamics and thermodynamic stability of monomer and homodimer structures of CXCL7, one of the most abundant platelet chemokines, using experimental methods that include Circular Dichroism (CD) and Nuclear Magnetic Resonance (NMR) spectroscopy, and computational methods that include the Anisotropic Network Model (ANM), Molecular Dynamics (MD) simulations and the Distance Constraint Model (DCM). A consistent picture emerges for the effects of dimerization and Cys5-Cys31 and Cys7-Cys47 disulfide bonds formation. The presence of disulfide bonds is not critical for maintaining structural stability in the monomer or dimer, but the monomer is destabilized more than the dimer upon removal of disulfide bonds. Disulfide bonds play a key role in shaping the characteristics of native state dynamics. The combined analysis shows that upon dimerization flexibly correlated motions are induced between the 30s and 50s loop within each monomer and across the dimer interface. Interestingly, the greatest gain in flexibility upon dimerization occurs when both disulfide bonds are present, and the homodimer is least stable relative to its two monomers. These results suggest that the highly conserved disulfide bonds in chemokines facilitate a structural mechanism that is tuned to optimally distinguish functional characteristics between monomer and dimer. This article is protected by copyright. All rights reserved.
Age-related cleavages of crystallins in human lens cortical fiber cells generate a plethora of endogenous peptides and high molecular weight complexes
Low molecular weight peptides derived from the breakdown of crystallins have been reported in adult human lenses. The proliferation of these LMW peptides coincides with the earliest stages of cataract formation, suggesting that the protein cleavages involved may contribute to the aggregation and insolubilization of crystallins. This study reports the identification of 238 endogenous LMW crystallin peptides from the cortical extracts of four human lenses representing young, middle and old-age human lenses. Analysis of the peptide terminal amino acids showed that Lys and Arg were situated at the C-terminus with significantly higher frequency compared to other residues, suggesting that trypsin-like proteolysis may be active in the lens cortical fiber cells. Selected reaction monitoring analysis of an endogenous αA-crystallin peptide (αA57-65) showed that the concentration of this peptide in the human lens increased gradually to middle age, after which the rate of αA57-65 formation escalated significantly. Using 2D gel electrophoresis/nanoLC-ESI-MS/MS, 12 protein complexes of 40–150 kDa consisting of multiple crystallin components were characterized from the water soluble cortical extracts of an adult human lens. The detection of these protein complexes suggested the possibility of crystallin cross-linking, with these complexes potentially acting to stabilize degraded crystallins by sequestration into water soluble complexes. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Topological and sequence information predict that foldons organize a partially overlapped and hierarchical structure
It has been suggested that proteins have substructures, called foldons, which can cooperatively fold into the native structure. However, several prior investigations define foldons in various ways, citing different foldon characteristics, thereby making the concept of a foldon ambiguous. In this study, we perform a Gō model simulation and analyze the characteristics of substructures that cooperatively fold into the native-like structure. Although some results do not agree well with the experimental evidence due to the simplicity of our coarse-grained model, our results strongly suggest that cooperatively folding units sometimes organize a partially overlapped and hierarchical structure. This view makes us easy to interpret some different proposal about the foldon as a difference of the hierarchical structure. On the basis of this finding, we present a new method to assign foldons and their hierarchy, using structural and sequence information. The results show that the foldons assigned by our method correspond to the intermediate structures identified by some experimental techniques. The new method makes it easy to predict whether a protein folds sequentially into the native structure or whether some foldons fold into the native structure in parallel. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Protein structure refinement via molecular-dynamics simulations: What works and what does not?
Protein structure refinement during CASP11 by the Feig group was described. Molecular dynamics simulations were used in combination with an improved selection and averaging protocol. On average, modest refinement was achieved with some targets improved significantly. Analysis of the CASP submission from our group focused on refinement success versus amount of sampling, refinement of different secondary structure elements and whether refinement varied as a function of which group provided initial models. The refinement of local stereochemical features was examined via the MolProbity score and an updated protocol was developed that can generate high-quality structures with very low MolProbity scores for most starting structures with modest computational effort. Proteins 2015. © 2015 Wiley Periodicals, Inc.
CASP11 refinement experiments with ROSETTA
We report new Rosetta-based approaches to tackling the major issues that confound protein structure refinement, and the testing of these approaches in the CASP11 experiment. Automated refinement protocols were developed that integrate a range of sampling methods using parallel computation and multiobjective optimization. In CASP11, we used a more aggressive large-scale structure rebuilding approach for poor starting models, and a less aggressive local rebuilding plus core refinement approach for starting models likely to be closer to the native structure. The more incorrectly modeled a structure was predicted to be, the more it was allowed to vary during refinement. The CASP11 experiment revealed strengths and weaknesses of the approaches: the high-resolution strategy incorporating local rebuilding with core refinement consistently improved starting structures, while the low-resolution strategy incorporating the reconstruction of large parts of the structures improved starting models in some cases but often considerably worsened them, largely because of model selection issues. Overall, the results suggest the high-resolution refinement protocol is a promising method orthogonal to other approaches, while the low-resolution refinement method clearly requires further development. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Here we present the results of residue–residue contact predictions achieved in CASP11 by the CONSIP2 server, which is based around our MetaPSICOV contact prediction method. On a set of 40 target domains with a median family size of around 40 effective sequences, our server achieved an average top-L/5 long-range contact precision of 27%. MetaPSICOV method bases on a combination of classical contact prediction features, enhanced with three distinct covariation methods embedded in a two-stage neural network predictor. Some unique features of our approach are (1) the tuning between the classical and covariation features depending on the depth of the input alignment and (2) a hybrid approach to generate deepest possible multiple-sequence alignments by combining jackHMMer and HHblits. We discuss the CONSIP2 pipeline, our results and show that where the method underperformed, the major factor was relying on a fixed set of parameters for the initial sequence alignments and not attempting to perform domain splitting as a preprocessing step. Proteins 2015. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Molecular dynamics simulations indicate that tyrosineB10 limits motions of distal histidine to regulate CO binding in soybean leghemoglobin
Myoglobin (Mb) uses strong electrostatic interaction in its distal heme pocket to regulate ligand binding. The mechanism of regulation of ligand binding in soybean leghemoglobin a (Lba) has been enigmatic and more so due to the absence of gaseous ligand bound atomic resolution three-dimensional structure of the plant globin. While the 20-fold higher oxygen affinity of Lba compared with Mb is required for its dual physiological function, the mechanism by which this high affinity is achieved is only emerging. Extensive mutational analysis combined with kinetic and CO-FT-IR spectroscopic investigation led to the hypothesis that Lba depended on weakened electrostatic interaction between distal HisE7 and bound ligand achieved by invoking B10Tyr, which itself hydrogen bonds with HisE7 thus restricting it in a single conformation detrimental to Mb-like strong electrostatic interaction. Such theory has been re-assessed here using CO-Lba in silico model and molecular dynamics simulation. The investigation supports the presence of at least two major conformations of HisE7 in Lba brought about by imidazole ring flip, one of which makes hydrogen bonds effectively with B10Tyr affecting the former's ability to stabilize bound ligand, while the other does not. However, HisE7 in Lba has limited conformational freedom unlike high frequency of imidazole ring flips observed in Mb and in TyrB10Leu mutant of Lba. Thus, it appears that TyrB10 limits the conformational freedom of distal His in Lba, tuning down ligand dissociation rate constant by reducing the strength of hydrogen bonding to bound ligand, which the freedom of distal His of Mb allows. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Crystal structure of the PepSY-containing domain of the YpeB protein involved in germination of bacillus spores
The crystal structure of the C-terminal domain of the Bacillus megaterium YpeB protein has been solved by X-ray crystallography to 1.80-Å resolution. The full-length protein is essential in stabilising the SleB cortex lytic enzyme in Bacillus spores, and may have a role in regulating SleB activity during spore germination. The YpeB-C crystal structure comprises three tandemly repeated PepSY domains, which are aligned to form an extended laterally compressed molecule. A predominantly positively charged region located in the second PepSY domain may provide a site for protein interactions that are important in stabilising SleB and YpeB within the spore. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Thermodynamics of Aβ16-21 dissociation from a fibril: Enthalpy, entropy, and volumetric properties
Here, we provide insights into the thermodynamic properties of Aβ16-21 dissociation from an amyloid fibril using all-atom molecular dynamics simulations in explicit water. An umbrella sampling protocol is used to compute potentials of mean force (PMF) as a function of the distance ξ between centers-of-mass of the Aβ16-21 peptide and the preformed fibril at nine temperatures. Changes in the enthalpy and the entropic energy are determined from the temperature dependence of these PMF(s) and the average volume of the simulation box is computed as a function of ξ. We find that the PMF at 310 K is dominated by enthalpy while the entropic energy does not change significantly during dissociation. The volume of the system decreases during dissociation. Moreover, the magnitude of this volume change also decreases with increasing temperature. By defining dock and lock states using the solvent accessible surface area (SASA), we find that the behavior of the electrostatic energy is different in these two states. It increases (unfavorable) and decreases (favorable) during dissociation in lock and dock states, respectively, while the energy due to Lennard-Jones interactions increases continuously in these states. Our simulations also highlight the importance of hydrophobic interactions in accounting for the stability of Aβ16-21. This article is protected by copyright. All rights reserved.
The determinants of bond angle variability in protein/peptide backbones: A comprehensive statistical/quantum mechanics analysis
The elucidation of the mutual influence between peptide bond geometry and local conformation has important implications for protein structure refinement, validation, and prediction. To gain insights into the structural determinants and the energetic contributions associated with protein/peptide backbone plasticity, we here report an extensive analysis of the variability of the peptide bond angles by combining statistical analyses of protein structures and quantum mechanics calculations on small model peptide systems. Our analyses demonstrate that all the backbone bond angles strongly depend on the peptide conformation and unveil the existence of regular trends as function of ψ and/or φ. The excellent agreement of the quantum mechanics calculations with the statistical surveys of protein structures validates the computational scheme here employed and demonstrates that the valence geometry of protein/peptide backbone is primarily dictated by local interactions. Notably, for the first time we show that the position of the Hα hydrogen atom, which is an important parameter in NMR structural studies, is also dependent on the local conformation. Most of the trends observed may be satisfactorily explained by invoking steric repulsive interactions; in some specific cases the valence bond variability is also influenced by hydrogen-bond like interactions. Moreover, we can provide a reliable estimate of the energies involved in the interplay between geometry and conformations. This article is protected by copyright. All rights reserved.
In recent years in silico protein structure prediction reached a level where fully automated servers can generate large pools of near-native structures. However, the identification and further refinement of the best structures from the pool of models remain problematic. To address these issues, we have developed (i) a target-specific selective refinement (SR) protocol; and (ii) molecular dynamics (MD) simulation based ranking (SMDR) method. In SR the all-atom refinement of structures is accomplished via the Rosetta Relax protocol, subject to specific constraints determined by the size and complexity of the target. The best-refined models are selected with SMDR by testing their relative stability against gradual heating through all-atom MD simulations. Through extensive testing we have found that Mufold-MD, our fully automated protein structure prediction server updated with the SR and SMDR modules consistently outperformed its previous versions. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Evolutionary and structural analyses of heterodimeric proteins composed of subunits with same fold
Heterodimeric proteins with homologous subunits of same fold are involved in various biological processes. The objective of this study is to understand the evolution of structural and functional features of such heterodimers. Using a non-redundant dataset of 70 such heterodimers of known 3D structure and an independent dataset of 173 heterodimers from yeast, we note that the mean sequence identity between interacting homologous subunits is only 23–24% suggesting that, generally, highly diverged paralogues assemble to form such a heterodimer. We also note that the functional roles of interacting subunits/domains are generally quite different. This suggests that, though the interacting subunits/domains are homologous, the high evolutionary divergence characterize their high functional divergence which contributes to a gross function for the heterodimer considered as a whole. The inverse relationship between sequence identity and RMSD of interacting homologues in heterodimers is not followed. We also addressed the question of formation of homodimers of the subunits of heterodimers by generating models of fictitious homodimers on the basis of the 3D structures of the heterodimers. Interaction energies associated with these homodimers suggests that, in overwhelming majority of the cases, such homodimers are unlikely to be stable. Majority of the homologues of heterodimers of known structures form heterodimers (51.8%) and a small proportion (14.6%) form homodimers. Comparison of 3D structures of heterodimers with homologous homodimers suggests that interfacial nature of residues is not well conserved. In over 90% of the cases we note that the interacting subunits of heterodimers are co-localized in the cell. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Effect of intrinsic and extrinsic factors on the simulated D-band length of type I collagen
A signature feature of collagen is its axial periodicity visible in TEM as alternating dark and light bands. In mature, type I collagen, this repeating unit, D, is 67 nm long. This periodicity reflects an underlying packing of constituent triple-helix polypeptide monomers wherein the dark bands represent gaps between axially adjacent monomers. This organization is visible distinctly in the microfibrillar model of collagen obtained from fiber diffraction. However, to date, no atomistic simulations of this diffraction model under zero-stress conditions have reported a preservation of this structural feature. Such a demonstration is important as it provides the baseline to infer response functions of physiological stimuli. In contrast, simulations predict a considerable shrinkage of the D-band (11–19%). Here we evaluate systemically the effect of several factors on D-band shrinkage. Using force fields employed in previous studies we find that irrespective of the temperature/pressure coupling algorithms, assumed salt concentration or hydration level, and whether or not the monomers are cross-linked, the D-band shrinks considerably. This shrinkage is associated with the bending and widening of individual monomers, but employing a force field whose backbone dihedral energy landscape matches more closely with our computed CCSD(T) values produces a small D-band shrinkage of < 3%. Since this force field also performs better against other experimental data, it appears that the large shrinkage observed in earlier simulations is a force-field artifact. The residual shrinkage could be due to the absence of certain atomic-level details, such as glycosylation sites, for which we do not yet have suitable data. Proteins 2015. © 2015 Wiley Periodicals, Inc.