Proteins: Structure, Function, Bioinformatics
Structure of the yeast Bre1 RING domain
Monoubiquitination of histone H2B at Lys123 in yeast plays a critical role in regulating transcription, mRNA export, DNA replication and the DNA damage response. The RING E3 ligase, Bre1, catalyzes monoubiquitination of H2B in concert with the E2 ubiquitin conjugating enzyme, Rad6. The crystal structure of a C-terminal fragment of Bre1 shows that the catalytic RING domain is preceded by an N-terminal helix that mediates coiled-coil interactions with a crystallographically related monomer. Homology modeling suggests that the human homologue of Bre1, RNF20/RNF40, heterodimerizes through similar coiled coil interactions. This article is protected by copyright. All rights reserved.
In Pseudomonas aeruginosa, the algH gene regulates the cellular concentrations of a number of enzymes and the production of several virulence factors, and is suggested to serve a global regulatory function. The precise mechanism by which the algH gene product, the AlgH protein, functions is unknown. The same is true for AlgH family members from other bacteria. In order to lay the groundwork for understanding the physical underpinnings of AlgH function, we examined the structure and physical properties of AlgH in solution. Under reducing conditions, results of NMR, electrophoretic mobility, and sedimentation equilibrium experiments indicate AlgH is predominantly monomeric and monodisperse in solution. Under non-reducing conditions intra- and intermolecular disulfide bonds form, the latter promoting AlgH oligomerization. The high-resolution solution structure of AlgH reveals alpha/beta-sandwich architecture fashioned from ten beta strands and seven alpha helices. Comparison with available structures of orthologues indicates conservation of overall structural topology. The region of the protein most strongly conserved structurally also shows the highest amino acid sequence conservation and, as revealed by hydrogen-deuterium exchange studies, is also the most stable. In this region, evolutionary trace analysis identifies two clusters of amino acid residues with the highest evolutionary importance relative to all other AlgH residues. These frame a partially solvent exposed shallow hydrophobic cleft, perhaps identifying a site for intermolecular interactions. The results establish a physical foundation for understanding the structure and function of AlgH and AlgH family proteins and should be of general importance for further investigations of these and related proteins. This article is protected by copyright. All rights reserved.
Computational characterization of the chemical step in the GTP hydrolysis by Ras-GAP for the wild-type and G13V mutated Ras
The free energy profiles for the chemical reaction of the guanosine triphosphate hydrolysis GTP + H2O GDP + Pi by Ras-GAP for the wild-type and G13V mutated Ras were computed by using molecular dynamics protocols with the QM(ab initio)/MM potentials. The results are consistent with the recent measurements of reaction kinetics in Ras-GAP showing about two-order reduction of the rate constant upon G13V mutation in Ras: the computed activation barrier on the free energy profile is increased by 3 kcal/mol upon the G13V replacement. The major reason for a higher energy barrier is a shift of the “arginine finger” (R789 from GAP) from the favorable position in the active site. The results of simulations provide support for the mechanism of the reference reaction according to which the Q61 side chain directly participates in chemical transformations at the proton transfer stage. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Deformability in the cleavage site of primary MicroRNA is not sensed by the double-stranded RNA binding domains in the microprocessor component DGCR8
The prevalence of double-stranded RNA (dsRNA) in eukaryotic cells has only recently been appreciated. Of interest here, RNA silencing begins with dsRNA substrates that are bound by the double-stranded RNA binding domains (dsRBDs) of their processing proteins. Specifically, processing of microRNA (miRNA) in the nucleus minimally requires the enzyme Drosha and its dsRBD-containing cofactor protein, DGCR8. The smallest recombinant construct of DGCR8 that is sufficient for in vitro dsRNA binding, referred to as DGCR8-Core, consists of its two dsRBDs and a C-terminal tail. Because dsRBDs rarely recognize the nucleotide sequence of dsRNA, it is reasonable to hypothesize that DGCR8 function is dependent on recognition of specific structural features in the miRNA precursor. Previously, we demonstrated that non-canonical structural elements that promote RNA flexibility within the stem of miRNA precursors are necessary for efficient in vitro cleavage by reconstituted Microprocessor complexes. Here we combine gel shift assays with in vitro processing assays to demonstrate that neither the N-terminal dsRBD of DGCR8 in isolation, nor the DGCR8-Core construct, are sensitive to the presence of non-canonical structural elements within the stem of miRNA precursors, or to single-stranded segments flanking the stem. Extending DGCR8-Core to include an N-terminal heme-binding region does not change our conclusions. Thus, our data suggest that while the DGCR8-Core region is necessary for dsRNA binding and recruitment to the Microprocessor, it is not sufficient to establish the previously observed connection between RNA flexibility and processing efficiency. This article is protected by copyright. All rights reserved.
Structural mapping of the coiled-coil domain of a bacterial condensin and comparative analyses across all domains of life suggest conserved features of SMC proteins
The structural maintenance of chromosomes (SMC) proteins form the cores of multisubunit complexes that are required for the segregation and global organization of chromosomes in all domains of life. These proteins share a common domain structure in which N- and C- terminal regions pack against one another to form a globular ATPase domain. This “head” domain is connected to a central, globular, “hinge” or dimerization domain by a long, antiparallel coiled coil. To date, most efforts for structural characterization of SMC proteins have focused on the globular domains. Recently, however, we developed a method to map interstrand interactions in the 50-nm coiled-coil domain of MukB, the divergent SMC protein found in γ-proteobacteria. Here, we apply that technique to map the structure of the Bacillus subtilis SMC (BsSMC) coiled-coil domain. We find that, in contrast to the relatively complicated coiled-coil domain of MukB, the BsSMC domain is nearly continuous, with only two detectable coiled-coil interruptions. Near the middle of the domain is a break in coiled-coil structure in which there are three more residues on the C-terminal strand than on the N-terminal strand. Close to the head domain, there is a second break with a significantly longer insertion on the same strand. These results provide an experience base that allows an informed interpretation of the output of coiled-coil prediction algorithms for this family of proteins. A comparison of such predictions suggests that these coiled-coil deviations are highly conserved across SMC types in a wide variety of organisms, including humans. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Human odorant-binding protein, OBPIIa, is expressed by nasal epithelia to facilitate transport of hydrophobic odorant molecules across the aqueous mucus. Here, we report its crystallographic analysis at 2.6 Å resolution. OBPIIa is a monomeric protein that exhibits the classical lipocalin fold with a conserved eight-stranded β-barrel harboring a remarkably large hydrophobic pocket. Basic residues within the four loops that shape the entrance to this ligand-binding site evoke a positive electrostatic potential. Human OBPIIa shows distinct features compared with other mammalian OBPs, including a potentially reactive Cys side chain within its pocket similar to human tear lipocalin. Proteins 2015. © 2015 Wiley Periodicals, Inc.
Accurate prediction of protein function in humans is important for understanding biological processes at the molecular level in biomedicine and drug design. Over a third of proteins are commonly held to bind metal, and ∼10% of human proteins, to bind zinc. Therefore, an initial step in protein function prediction frequently involves predicting metal ion binding. In recent years, methods have been developed to predict a set of residues in 3D space forming the metal-ion binding site, often with a high degree of accuracy. Here, using extensions of these methods, we provide an extensive list of human proteins and their putative metal ion binding site residues, using translated gene sequences derived from the complete, resolved human genome. Under conditions of ∼90% selectivity, over 900 new human putative metal ion binding proteins are identified. A statistical analysis of resolved metal ion binding sites in the human metalloproteome is furnished and the importance of remote homology analysis is demonstrated. As an example, a novel metal-ion binding site involving a complex of a botulinum substrate with its inhibitor is presented. On the basis of the location of the predicted site and the interactions of the contacting residues at the complex interface, we postulate that metal ion binding in this region could influence complex formation and, consequently, the functioning of the protein. Thus, this work provides testable hypotheses about novel functions of known proteins. Proteins 2015; 83:931–939. © 2015 Wiley Periodicals, Inc.
Crystal structure of a novel two domain GH78 family α-rhamnosidase from Klebsiella oxytoca with rhamnose bound
The crystal structure of the GH78 family α-rhamnosidase from Klebsiella oxytoca (KoRha) has been determined at 2.7 Å resolution with rhamnose bound in the active site of the catalytic domain. Curiously, the putative catalytic acid, Asp 222, is preceded by an unusual non-proline cis-peptide bond which helps to project the carboxyl group into the active centre. This KoRha homodimeric structure is significantly smaller than those of the other previously determined GH78 structures. Nevertheless, the enzyme displays α-rhamnosidase activity when assayed in vitro, suggesting that the additional structural domains found in the related enzymes are dispensible for function. This article is protected by copyright. All rights reserved.
The conservation profile of a protein bears the imprint of the molecule that is evolutionarily coupled to the protein
The conservation profile of a protein is a curve of the conservation levels of amino acids along the sequence. Biologists are usually more interested in individual points on the curve (namely, the conserved amino acids) than the overall shape of the curve. Here, we show that the conservation curves of proteins bear the imprints of molecules that are evolutionarily coupled to the proteins. Our method is based on recent studies that a sequence conservation profile is quantitatively linked to its structural packing profile. We find that the conservation profiles of nucleic acid (NA) binding proteins are better correlated with the packing profiles of the protein-NA complexes than those of the proteins alone. This indicates that a nucleic acid binding protein evolves to accommodate the nucleic acid in such a way that the residues involved in binding have their conservation levels closely coupled with the specific nucleotides. This article is protected by copyright. All rights reserved.
RAS subfamily proteins regulates cell growth promoting signaling processes by cycling between active (GTP-bound) and inactive (GDP-bound) states. Different RAS isoforms, though structurally similar, exhibit functional specificity and are associated with different types of cancers and developmental disorders. Understanding the dynamical differences between the isoforms is crucial for the design of inhibitors that can selectively target a particular malfunctioning isoform. In this study, we provide a comprehensive comparison of the dynamics of all the three RAS isoforms (HRAS, KRAS, and NRAS) using extensive molecular dynamics simulations in both the GDP- (total of 3.06μs) and GTP-bound (total of 2.4μs) states. We observed significant differences in the dynamics of the isoforms, which rather interestingly, varied depending on the type of the nucleotide bound and the simulation temperature. Both SwitchI (residues 25-40) and SwitchII (residues 59-75) differ significantly in their flexibility in the three isoforms. Furthermore, Principal Component Analysis showed that there are differences in the conformational space sampled by the GTP-bound RAS isoforms. We also identified a previously unreported pocket, which opens transiently during MD simulations, and can be targeted to regulate nucleotide exchange reaction or possibly interfere with membrane localization. Further, we present the first simulation study showing GDP destabilization in the wild-type RAS protein. The destabilization of GDP/GTP occurred only in 1/50 simulations, emphasizing the need of guanine nucleotide exchange factors (GEF) to accelerate such an extremely unfavorable process. This observation along with the other results presented in this paper further support our previously hypothesized mechanism of GEF-assisted nucleotide exchange. This article is protected by copyright. All rights reserved.
Flexibility and dynamics are important for protein function and a protein's ability to accommodate amino acid substitutions. However, when computational protein design algorithms search over protein structures, the allowed flexibility is often reduced to a relatively small set of discrete side-chain and backbone conformations. While simplifications in scoring functions and protein flexibility are currently necessary to computationally search the vast protein sequence and conformational space, a rigid representation of a protein causes the search to become brittle and miss low-energy structures. Continuous rotamers more closely represent the allowed movement of a side chain within its torsional well and have been successfully incorporated into the protein design framework to design biomedically relevant protein systems. The use of continuous rotamers in protein design enables algorithms to search a larger conformational space than previously possible, but adds additional complexity to the design search. To design large, complex systems with continuous rotamers, new algorithms are needed to increase the efficiency of the search. We present two methods, PartCR and HOT, that greatly increase the speed and efficiency of protein design with continuous rotamers. These methods specifically target the large errors in energetic terms that are used to bound pairwise energies during the design search. By tightening the energy bounds, additional pruning of the conformation space can be achieved, and the number of conformations that must be enumerated to find the global minimum energy conformation is greatly reduced. This article is protected by copyright. All rights reserved.
Polo-like kinases (Plks) are the key regulators of cell cycle progression, the members of which share a kinase domain and a polo-box domain (PBD) that serves as a protein-binding module. While Plk1 is a promising target for antitumor therapy, Plk2 is regarded as a tumor suppressor even though the two Plks commonly recognize the S-pS/T-P motif through their PBD. Herein, we report the crystal structure of the PBD of Plk2 at 2.7 Å. Despite the overall structural similarity with that of Plk1 reflecting their high sequence homology, the crystal structure also contains its own features including the highly ordered loop connecting two subdomains and the absence of 310-helices in the N-terminal region unlike the PBD of Plk1. Based on the three-dimensional structure, we furthermore could model its interaction with two types of phosphopeptides, one of which was previously screened as the optimal peptide for the PBD of Plk2. This article is protected by copyright. All rights reserved.
RBRIdent: An algorithm for improved identification of RNA-binding residues in proteins from primary sequences
Rapid and correct identification of RNA-binding residues based on the protein primary sequences is of great importance. In most prevalent machine-learning-based identification methods, however, either some features are inefficiently represented, or the redundancy between features is not effectively removed. Both problems may weaken the performance of a classifier system and raise its computational complexity. Here, we addressed the above problems and developed a better classifier (RBRIdent) to identify the RNA-binding residues. In an independent benchmark test, RBRIdent achieved an accuracy of 76.79%, Matthews correlation coefficient of 0.3819 and F-measure of 75.58%, remarkably outperforming all prevalent methods. These results suggest the necessity of proper feature description and the essential role of feature selection in this project. All source data and codes are freely available at http://126.96.36.199/RBRIdent. This article is protected by copyright. All rights reserved.
Complete characterization of the mutation landscape reveals the effect on amylin stability and amyloidogenicity
Type-II diabetes is believed to be partially aggravated by the emergence of toxic amylin protein deposits in the extracellular space of the pancreas β-cells. Amylin, the regulatory hormone that is co-secreted with insulin, has been observed to misfold into toxic structures. Pramlintide, an FDA approved injectable amylin analog mutated at positions 25, 28, and 29 was therefore developed to create a more stable, soluble, less-aggregating, and equipotent peptide that is used as an adjunctive therapy for diabetes. However, because Pramlintide is not ideal, researchers have been exploring other amylin analogs as therapeutic replacements. In this work, we assist the finding of optimal analogs by computationally revealing the mutational landscape of amylin. We computed the structure energies of all possible single-point mutations and studied the effect they have on amylin stability and amyloidogenicity. Each of the 37 amylin residues was mutated in silico into the 19 canonical amino acids and an energy function computing the Lennard–Jones, Coulomb and solvation energy was used to analyze changes in stability. The mutation landscape identified amylin's conserved stable regions, residues that can be tweaked to further stabilize structure, regions that are susceptible to mutations, and mutations that are amyloidogenic. We used the single-point mutational landscape data to generate estimations for higher-order multiple-point mutational landscapes and discovered millions of three-point mutations that are more stable and less amyloidogenic than Pramlintide. The landscapes provided an explanation for the effect of the S20G and Q10R mutations on the onset of diabetes of the Chinese and Maori populations, respectively. Proteins 2015. © 2015 Wiley Periodicals, Inc.
The unique N-terminal insert in the ribosomal protein, phosphoprotein P0, of Tetrahymena thermophila: Bioinformatic evidence for an interaction with 26S rRNA
Phosphoprotein P0 (P0) is part of the stalk complex of the eukaryotic large ribosomal subunit necessary for recruiting elongation factors. While the P0 sequence is highly conserved, our group noted a 15-16 residue insert exclusive to the P0s of ciliated protists, including Tetrahymena thermophila. We hypothesized that this insert may have a function unique in ciliated protists, such as stalk regulation via phosphorylation of the insert. Almost no mention of this insert exists in the literature, and although the T. thermophila ribosome has been crystallized, there is limited structural data for Tetrahymena's P0 (TtP0) and its insert. To investigate the structure and function of the TtP0 insert, we performed in silico analyses. The TtP0 sequence was scanned with phosphorylation site prediction tools to detect the likelihood of phosphorylation in the insert. TtP0's sequence was also used to produce a homology model of the N-terminal domain of TtP0, including the insert. When the insert was modeled in the context of the 26S rRNA, it associated with a region identified as expansion segment 7B (ES7B), suggesting a potential functional interaction between ES7B and the insert in T. thermophila. We were not able to obtain sufficient data to determine whether a similar relationship exists in other ciliated protists. This study lays the groundwork for future experimental studies to verify the presence of TtP0 insert/ES7 interactions in Tetrahymena, and to explore their functional significance during protein synthesis. This article is protected by copyright. All rights reserved.
For many membrane proteins, the determination of their topology remains a challenge for methods like X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. Electron paramagnetic resonance (EPR) spectroscopy has evolved as an alternative technique to study structure and dynamics of membrane proteins. The present study demonstrates the feasibility of membrane protein topology determination using limited EPR distance and accessibility measurements. The BCL::MP-Fold (BioChemical Library membrane protein fold) algorithm assembles secondary structure elements (SSEs) in the membrane using a Monte Carlo Metropolis (MCM) approach. Sampled models are evaluated using knowledge-based potential functions and agreement with the EPR data and a knowledge-based energy function. Twenty-nine membrane proteins of up to 696 residues are used to test the algorithm. The RMSD100 value of the most accurate model is better than 8Å for twenty-seven, better than 6Å for twenty-two and better than 4Å for fifteen out of twenty-nine proteins, demonstrating the algorithms ability to sample the native topology. The average enrichment could be improved from 1.3 to 2.5, showing the improved discrimination power by using EPR data. This article is protected by copyright. All rights reserved.
As the volume of data relating to proteins increases, researchers rely more and more on the analysis of published data, thus increasing the importance of good access to these data that vary from the supplemental material of individual papers, all the way to major reference databases with professional staff and long-term funding. Specialist protein resources fill an important middle ground, providing interactive web interfaces to their databases for a focused topic or family of proteins, using specialised approaches that are not feasible in the major reference databases. Many are labours of love, run by a single lab with little or no dedicated funding and there are many challenges to building and maintaining them. This perspective arose from a meeting of several specialist protein resources and major reference databases held at the Wellcome Trust Genome Campus (Cambridge, UK) on the 11th and 12th of August 2014. During this meeting some common key challenges involved in creating and maintaining such resources were discussed, along with various approaches to address them. In laying out these challenges, we aim to inform users about how these issues impact our resources and illustrate ways in which our working together could enhance their accuracy, currency, and overall value. This article is protected by copyright. All rights reserved.
Method for identification of rigid domains and hinge residues in proteins based on exhaustive enumeration
Many proteins undergo large-scale motions where relatively rigid domains move against each other. The identification of rigid domains, as well as the hinge residues important for their relative movements, is important for various applications including flexible docking simulations. In this work, we develop a method for protein rigid domain identification based on an exhaustive enumeration of maximal rigid domains, the rigid domains not fully contained within other domains. The computation is performed by mapping the problem to that of finding maximal cliques in a graph. A minimal set of rigid domains are then selected, which cover most of the protein with minimal overlap. In contrast to the results of existing methods that partition a protein into non-overlapping domains using approximate algorithms, the rigid domains obtained from exact enumeration naturally contain overlapping regions, which correspond to the hinges of the inter-domain bending motion. The performance of the algorithm is demonstrated on several proteins. This article is protected by copyright. All rights reserved.
Structure and dynamics of DRD4 bound to an agonist and an antagonist using in silico approaches
Human dopamine receptor D4 (DRD4), a member of G-protein coupled receptor (GPCR) family, plays a central role in cell signaling and trafficking. Dysfunctional activity of DRD4 can lead to several psychiatric conditions and, therefore, represents target for many neurological disorders. However, lack of atomic structure impairs our understanding of the mechanism regulating its activity. Here, we report the modeled structure of DRD4 alone and in complex with dopamine and spiperone, its natural agonist and antagonist, respectively. To assess the conformational dynamics induced upon ligand binding, all-atom explicit solvent molecular dynamics simulations in membrane environment were performed. Comprehensive analyses of simulations reveal that agonist binding triggers a series of conformational changes in the transmembrane region, including rearrangement of residues, characteristic of transmission and tyrosine toggle molecular switches. Further, the trajectories indicate that a loop region in the intracellular region––ICL3, is significantly dynamic in nature, mainly due to the side-chain movements of conserved proline residues involved in SH3 binding domains. Interestingly, in dopamine-bound receptor simulation, ICL3 represents an open conformation ideal for G protein binding. The structural and dynamical information presented here suggest a mode of activation of DRD4, upon ligand binding. Our study will help in further understanding of receptor activation, as acquiring structural information is crucial for the design of highly selective DRD4 ligands. Proteins 2014; 83:867–880. © 2014 Wiley Periodicals, Inc.