Proteins: Structure, Function, Bioinformatics
An analysis approach to identify specific functional sites in orthologous proteins using sequence and structural information: Application to neuroserpin reveals regions that differentially regulate inhibitory activity
The analysis of sequence conservation is commonly used to predict functionally important sites in proteins. We have developed an approach that first identi-fies highly conserved sites in a set of orthologous sequences using a weighted substitution-matrix-based conservation score and then filters these conserved sites based on the pattern of conservation present in a wider alignment of se-quences from the same family and structural information to identify surface exposed sites. This allows us to detect specific functional sites in the target protein and exclude regions that are likely to be generally important for the structure or function of the wider protein family. We applied our method to two members of the serpin family of serine protease inhibitors. We first confirmed that our method successfully detected the known heparin binding site in antithrombin while excluding residues known to be general important in the serpin family. We next applied our sequence analysis approach to neuroserpin and used our results to guide site-directed polyalanine mutagenesis experiments. The majority of the mutant neuroserpin proteins were found to fold correctly and could still form inhibitory complexes with tissue plasminogen activator (tPA). Kinetic analysis of tPA inhibition, however, revealed altered inhibitory kinetics in several of the mutant proteins, with some mutants showing decreased association with tPA and others showing more rapid dissociation of the covalent complex. Altogether, these results confirm that our sequence analysis approach is a useful tool that can be used to guide mutagenesis experiments for the detection of specific functional sites in proteins. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
E. coli ClpB is a molecular chaperone that belongs to the Clp/Hsp100 family of AAA+ proteins. ClpB is able to form a hexameric ring structure to catalyze protein disaggregation with the assistance of the DnaK chaperone system. Our knowledge of the mechanism of how ClpB recognizes its substrates is still limited. In this work, we have quantitatively investigated ClpB binding to a number of unstructured polypeptides using steady-state anisotropy titrations. To precisely determine the binding affinity for the interaction between ClpB hexamers and polypeptide substrates the titration data were subjected to global non-linear least squares analysis incorporating the dynamic equilibrium of ClpB assembly. Our results show that ClpB hexamers bind tightly to unstructured polypeptides with binding affinities in the range of ˜3 – 16 nM. ClpB exhibits a modest preference of binding to Peptide B1 with a binding affinity of (1.7 ± 0.2) nM. Interestingly, we found that ClpB binds to an unstructured polypeptide substrate of 40 and 50 amino acids containing the SsrA sequence at the C-terminus with an affinity of (12 ± 3) nM and (4 ± 2) nM, respectively. Whereas, ClpB binds the 11-amino acid SsrA sequence with an affinity of (140 ± 20) nM, which is significantly weaker than other polypeptide substrates that we tested here. We hypothesize that ClpB, like ClpA, requires substrates with a minimum length for optimal binding. Finally, we present evidence showing that multiple ClpB hexamers are involved in binding to polypeptides ≥ 152 amino acids. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Computational enzyme design is an emerging field that has yielded promising success stories, but where numerous challenges remain. Accurate methods to rapidly evaluate possible enzyme design variants could provide significant value when combined with experimental efforts by reducing the number of variants needed to be synthesized and speeding the time to reach the desired endpoint of the design. To that end, extending our computational methods to model the fundamental physical–chemical principles that regulate activity in a protocol that is automated and accessible to a broad population of enzyme design researchers is essential. Here, we apply a physics-based implicit solvent MM-GBSA scoring approach to enzyme design and benchmark the computational predictions against experimentally determined activities. Specifically, we evaluate the ability of MM-GBSA to predict changes in affinity for a steroid binder protein, catalytic turnover for a Kemp eliminase, and catalytic activity for α-Gliadin peptidase variants. Using the enzyme design framework developed here, we accurately rank the most experimentally active enzyme variants, suggesting that this approach could provide enrichment of active variants in real-world enzyme design applications.Proteins 2014. © 2014 Wiley Periodicals, Inc.
Detecting local residue environment similarity for recognizing near-native structure models
We developed a new representation of local amino acid environments in protein structures called the Side-chain Depth Environment (SDE). An SDE defines a local structural environment of a residue considering the coordinates and the depth of amino acids that locate in the vicinity of the side-chain centroid of the residue. SDEs are general enough that similar SDEs are found in protein structures with globally different folds. Using SDEs, we developed a procedure called PRESCO (Protein Residue Environment SCOre) for selecting native or near-native models from a pool of computational models. The procedure searches similar residue environments observed in a query model against a set of representative native protein structures to quantify how native-like SDEs in the model are. When benchmarked on commonly used computational model datasets, our PRESCO compared favorably with the other existing scoring functions in selecting native and near-native models. Proteins 2014. © 2014 Wiley Periodicals, Inc.
The near-symmetry of proteins
The majority of protein oligomers form clusters which are nearly symmetric. Understanding of that imperfection, its origins, and perhaps also its advantages requires the conversion of the currently used vague qualitative descriptive language of the near-symmetry into an accurate quantitative measure that will allow to answer questions such as: ‘What is the degree of symmetry deviation of the protein?’, ‘how do these deviations compare within a family of proteins?’, and so on. We developed quantitative methods to answer this type of questions, which are capable of analyzing the whole protein, its backbone or selected portions of it, down to comparison of symmetry-related specific amino-acids, and which are capable of visualizing the various levels of symmetry deviations in the form of symmetry maps. We have applied these methods on an extensive list of homomers and heteromers and found that apparently all proteins never reach perfect symmetry. Strikingly, even homomeric protein clusters are never ideally symmetric. We also found that the main burden of symmetry distortion is on the amino-acids near the symmetry axis; that it is mainly the more hydrophilic amino-acids that take place in symmetry-distortive interactions; and more. The remarkable ability of heteromers to preserve near-symmetry, despite the different sequences, was also shown and analyzed. The comprehensive literature on the suggested advantages symmetric oligomerizations raises a yet-unsolved key question: If symmetry is so advantageous, why do proteins stop shy of perfect symmetry? Some tentative answers to be tested in further studies are suggested in a concluding outlook. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
The modulation of ALDH activity has been suggested as a promising option for the prevention or treatment of many diseases. To date only few activating compounds of ALDHs have been described. In this regard, ALDA-1 has been used to protect the heart against ischemia/reperfusion damage. In the search for new modulating ALDH molecules, the binding capability of different compounds to the active site of human ALDH1A1 was analyzed by molecular docking and their ability to modulate the activity of the enzyme was tested. Surprisingly, tamoxifen an estrogen receptor antagonist used for breast cancer treatment increased activity and decreased Km for NAD+ by about two-fold in ALDH1A1. No drug effect on human ALDH2 or ALDH3A1 was attained, showing that tamoxifen was specific for ALDH1A1. Protection against thermal denaturation and competition with daidzin, suggested that tamoxifen binds to the aldehyde site of ALDH1A1, resembling the interaction of ALDA-1 with ALDH2. Further kinetic analysis indicated that tamoxifen activation may be related to an increase on the Kd for NADH, favoring a more rapid release of the coenzyme, which is the rate-limiting step of the reaction for this isozyme. Therefore, tamoxifen might improve the antioxidant response which is compromised in some diseases. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
DNA repair is fundamental to genome stability and is found in all three domains of life. However, many archaeal species, such as Methanopyrus kandleri, contain only a subset of the eukaryotic nucleotide excision repair (NER) homologues, and those present often contain significant differences compared to their eukaryotic homologues. To clarify the role of the NER XPG-like protein Mk0566 from M. kandleri, its biochemical activity and three dimensional structure were investigated. Both were found to be more similar to human FEN-1 than human XPG, suggesting a biological role in replication and long-patch base excision repair rather than in NER. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Role of active-site residues Tyr55 and Tyr114 in catalysis and substrate specificity of Corynebacterium diphtheriae C-S lyase
In recent years there has been increased interest in bacterial methionine biosynthesis enzymes as antimicrobial targets because of their pivotal role in cell metabolism. C-S lyase from Corynebacterium diphtheriae is a pyridoxal 5′-phosphate-dependent enzyme in the transsulfuration pathway that catalyzes the α,β-elimination of sulfur-containing amino acids, such as L-cystathionine, to generate ammonia, pyruvate, and homocysteine, the immediate precursor of L-methionine. In order to gain deeper insight into the functional and dynamic properties of the enzyme, mutants of two highly conserved active-site residues, Y55F and Y114F, were characterized by UV-visible absorbance, fluorescence, and CD spectroscopy in the absence and presence of substrate and substrate analogs, as well as by steady-state kinetic studies. Substitution of Tyr55 with Phe apparently causes a 130-fold decrease in KdPLP at pH 8.5 providing evidence that Tyr55 plays a role in cofactor binding. Moreover, spectral data show that the mutant accumulates the external aldimine intermediate suggesting that the absence of interaction between the hydroxyl moiety and PLP-binding residue Lys222 causes a decrease in the rate of substrate deprotonation. Mutation of Tyr114 with Phe slightly influences hydrolysis of L-cystathionine, and causes a change in substrate specificity towards L-serine and O-acetyl-L-serine compared to the wild type enzyme. These findings, together with computational data, provide useful insights in the substrate specificity of C-S lyase, which seems to be regulated by active-site architecture and by the specific conformation in which substrates are bound, and will aid in development of inhibitors. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Computational modeling of membrane proteins
The determination of membrane protein (MP) structures has always trailed that of soluble proteins due to difficulties in their overexpression, reconstitution into membrane mimetics, and subsequent structure determination. The percentage of MP structures in the protein databank (PDB) has been at a constant 1-2% for the last decade. In contrast, over half of all drugs target MPs, only highlighting how little we understand about drug-specific effects in the human body. To reduce this gap, researchers have attempted to predict structural features of MPs even before the first structure was experimentally elucidated. In this review, we present current computational methods to predict MP structure, starting with secondary structure prediction, prediction of trans-membrane spans, and topology. Even though these methods generate reliable predictions, challenges such as predicting kinks or precise beginnings and ends of secondary structure elements are still waiting to be addressed. We describe recent developments in the prediction of 3D structures of both α-helical MPs as well as β-barrels using comparative modeling techniques, de novo methods, and molecular dynamics (MD) simulations. The increase of MP structures has (1) facilitated comparative modeling due to availability of more and better templates, and (2) improved the statistics for knowledge-based scoring functions. Moreover, de novo methods have benefitted from the use of correlated mutations as restraints. Finally, we outline current advances that will likely shape the field in the forthcoming decade. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Structure of a cupin protein Plu4264 from Photorhabdus luminescens subsp. laumondii TTO1 at 1.35 Å resolution
Proteins belonging to the cupin superfamily have a wide range of catalytic and non-catalytic functions. Cupin proteins commonly have the capacity to bind a metal ion with the metal frequently determining the function of the protein. We have been investigating the function of homologous cupin proteins that are conserved in more than 40 species of bacteria. To gain insights into the potential function of these proteins we have solved the structure of Plu4264 from Photorhabdus luminescens TTO1 at a resolution of 1.35 Å and identified manganese as the likely natural metal ligand of the protein. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Ankyrins (Ank) are a ubiquitously expressed family of multifunctional membrane adapter proteins. Ankyrin G (AnkG) is critical for assembling and maintenance of the axon initial segment. Here we present the 2.1 Å crystal structure of human AnkG death domain (hAnkG-DD). The core death domain is composed of six α-helices and three 310-helices. It forms a hydrophobic pocket on the surface of the molecule. The C-terminal tail of the hAnkG-DD curves back to have the aromatic ring of a phenylalanine residue, Phe100 insert into this pocket, which anchors the flexible tail onto the core domain. Related DDs were selected for structure comparison. The major variations are at the C-terminal region, including the α6 and the long C-terminal extension. The results of size exclusion chromatography and analytical ultracentrifugation suggest that hAnkG-DD exists as monomer in solution. Our work should help for the future investigation of the structure–function of AnkG.Proteins 2014. © 2014 Wiley Periodicals, Inc.
Carbohydrate binding module recognition of xyloglucan defined by polar contacts with branching xyloses and CH-Π interactions
Engineering of novel carbohydrate-binding proteins that can be utilized in various biochemical and biotechnical applications would benefit from a deeper understanding of the biochemical interactions that determine protein-carbohydrate specificity. In an effort to understand further the basis for specificity we present the crystal structure of the multi-specific carbohydrate-binding module (CBM) X-2 L110F bound to a branched oligomer of xyloglucan (XXXG). X-2 L110F is an engineered CBM that can recognize xyloglucan, xylans and β-glucans. The structural observations of the present study compared with previously reported structures of X-2 L110F in complex with linear oligomers, show that the π-surface of a phenylalanine, F110, allows for interactions with hydrogen atoms on both linear (xylopentaose and cellopentaose) and branched ligands (XXXG). Furthermore, X-2 L110F is shown to have a relatively flexible binding cleft, as illustrated in binding to XXXG. This branched ligand requires a set of reorientations of protein side chains Q72, N31, and R142, although these residues have previously been determined as important for binding to xylose oligomers by mediating polar contacts. The loss of these polar contacts is compensated for in binding to XXXG by polar interactions mediated by other protein residues, T74, R115, and Y149, which interact mainly with the branching xyloses of the xyloglucan oligomer. Taken together, the present study illustrates in structural detail how CH-π interactions can influence binding specificity and that flexibility is a key feature for the multi-specificity displayed by X-2 L110F, allowing for the accommodation of branched ligands. © 2014 Wiley Periodicals, Inc.
Brugia malayi is a parasitic nematode that causes lymphatic filariasis in humans. Here the solution structure of the forkhead DNA binding domain of Brugia malayi DAF-16a, a putative ortholog of Caenorhabditis elegans DAF-16, is reported. It is believed to be the first structure of a forkhead or winged helix domain from an invertebrate. C. elegans DAF-16 is involved in the insulin/IGF-I signaling pathway and helps control metabolism, longevity, and development. Conservation of sequence and structure with human FOXO proteins suggests that B. malayi DAF-16a is a member of the FOXO family of forkhead proteins. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Application of information theory to a three-body coarse-grained representation of proteins in the PDB: Insights into the structural and evolutionary roles of residues in protein structure
Knowledge-based methods for analyzing protein structures, such as statistical potentials, primarily consider the distances between pairs of bodies (atoms or groups of atoms). Considerations of several bodies simultaneously are generally used to characterize bonded structural elements or those in close contact with each other, but historically do not consider atoms that are not in direct contact with each other. In this report, we introduce an information-theoretic method for detecting and quantifying distance-dependent through-space multibody relationships between the sidechains of three residues. The technique introduced is capable of producing convergent and consistent results when applied to a sufficiently large database of randomly chosen, experimentally solved protein structures. The results of our study can be shown to reproduce established physico-chemical properties of residues as well as more recently discovered properties and interactions. These results offer insight into the numerous roles that residues play in protein structure, as well as relationships between residue function, protein structure, and evolution. The techniques and insights presented in this work should be useful in the future development of novel knowledge-based tools for the evaluation of protein structure. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Many mutations in the N-terminal arm of AraC result in constitutive behavior in which transcription of the araBAD genes occurs even in the absence of arabinose. To begin to understand the mechanism underlying this class of mutations, we used molecular dynamics with self-guided Langevin dynamics to simulate (1) wild-type (WT) AraC, (2) known constitutive mutants resulting from alterations in the regulatory arm, particularly alanine and glycine substitutions at residue 8 because P8G is constitutive, whereas P8A behaves like wild type, and (3) selected variant AraC proteins containing alterations in the dimerization core. In all of the constitutive arm mutants, but not the WT protein, residues 37–42, which are located in the core of the dimerization domain, became restructured. This raised the question of whether or not these structural changes are an obligatory component of constitutivity. Using molecular dynamics, we identified alterations in the core that produced a similar restructuring. The corresponding mutants were constructed and their ara constitutivity status was determined experimentally. Because the core mutants were not found to be constitutive, we conclude that restructuring of core residues 37–42 does not, itself, lead to constitutivity of AraC. The available data lead to the hypothesis that the interaction of the N-terminal arm with something other than the front lip is the primary determinant of the inducing versus repressing state of AraC. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Dual effects of familial Alzheimer's disease mutations (D7H, D7N, and H6R) on amyloid β peptide: Correlation dynamics and zinc binding
Although the N-terminal region of Amyloid β (Aβ) peptides plays dual roles as metal-coordinating sites and conformational modulator, few studies have been performed to explore the effects of mutations at this region on the overall conformational ensemble of Aβ and the binding propensity of metal ions. In this work, we focus on how three familial Alzheimer's disease mutations (D7H, D7N, and H6R) alter the structural characteristics and thermodynamic stabilities of Aβ42 using molecular dynamics simulations. We observe that each mutation displays increased β-sheet structures in both N and C termini. In particular, both the N terminus and central hydrophobic region of D7H can form stable β-hairpin structures with its C terminus. The conserved turn structure at Val24–Lys28 in all peptides and Zn2+-bound Aβ42 is confirmed as the common structural motif to nucleate folding of Aβ. Each mutant can significantly increase the solvation free energy and thus enhance the aggregation of Aβ monomers. The correlation dynamics between Aβ(1–16) and Aβ(17–42) fragments are elucidated by linking the domain motions with the corresponding structured conformations. We characterize the different populations of correlated domain motions for each mutant from a more macroscopic perspective, and unexpectedly find that Zn2+-bound Aβ42 ensemble shares the same populations as Aβ42, indicating that the binding of Zn2+ to Aβ follows the conformational selection mechanism, and thus is independent of domain motions, even though the structures of Aβ have been modified at a residue level. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Binding mode analysis of a major T3SS translocator protein PopB with its chaperone PcrH from Pseudomonas aeruginosa
Pseudomonas aeruginosa, a Gram-negative pathogen uses a specialized set of Type III secretion system (T3SS) translocator proteins to establish virulence in the host cell. An understanding of the factors that govern translocation by the translocator protein–chaperone complex is thus of immense importance. In this work, experimental and computational techniques were used to probe into the structure of the major translocator protein PopB from P. aeruginosa and to identify the important regions involved in functioning of the translocator protein. This study reveals that the binding sites of the common chaperone PcrH, needed for maintenance of the translocator PopB within the bacterial cytoplasm, which are primarily localized within the N-terminal domain. However, disordered and flexible residues located both at the N- and C-terminal domains are also observed to be involved in association with the chaperone. This intrinsic disorderliness of the terminal domains is conserved for all the major T3SS translocator proteins and is functionally important to maintain the intrinsically disordered state of the translocators. Our experimental and computational analyses suggest that a “disorder-to-order” transition of PopB protein might take place upon PcrH binding. The long helical coiled-coil part of PopB protein perhaps helps in pore formation while the flexible apical region is involved in chaperone interaction. Thus, our computational model of translocator protein PopB and its binding analyses provide crucial functional insights into the T3SS translocation mechanism. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Density functional theory calculations on entire proteins for free energies of binding: Application to a model polar binding site
In drug optimization calculations, the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method can be used to compute free energies of binding of ligands to proteins. The method involves the evaluation of the energy of configurations in an implicit solvent model. One source of errors is the force field used, which can potentially lead to large errors due to the restrictions in accuracy imposed by its empirical nature. To assess the effect of the force field on the calculation of binding energies, in this article we use large-scale density functional theory (DFT) calculations as an alternative method to evaluate the energies of the configurations in a “QM-PBSA” approach. Our DFT calculations are performed with a near-complete basis set and a minimal parameter implicit solvent model, within the self-consistent calculation, using the ONETEP program on protein–ligand complexes containing more than 2600 atoms. We apply this approach to the T4-lysozyme double mutant L99A/M102Q protein, which is a well-studied model of a polar binding site, using a set of eight small aromatic ligands. We observe that there is very good correlation between the MM and QM binding energies in vacuum but less so in the solvent. The relative binding free energies from DFT are more accurate than the ones from the MM calculations, and give markedly better agreement with experiment for six of the eight ligands. Furthermore, in contrast to MM-PBSA, QM-PBSA is able to correctly predict a nonbinder. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Sequence and conformational preferences at termini of α-helices in membrane proteins: Role of the helix environment
α-helices are amongst the most common secondary structural elements seen in membrane proteins and are packed in the form of helix bundles. These α-helices encounter varying external environments (hydrophobic, hydrophilic) that may influence the sequence preferences at their N and C-termini. The role of the external environment in stabilization of the helix termini in membrane proteins is still unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical membrane proteins and establish that their sequence and conformational preferences differ from those in globular proteins. We specifically examine these preferences at the N and C-termini in helices initiating/terminating inside the membrane core as well as in linkers connecting these transmembrane helices. We find that the sequence preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N' and C') positions are influenced by a combination of features including the membrane environment and the innate helix initiation and termination property of residues forming structural motifs. We also find that a large number of helix termini which do not form any particular capping motif are stabilized by formation of hydrogen bonds and hydrophobic interactions contributed from the neighboring helices in the membrane protein. We further validate the sequence preferences obtained from our analysis with data from an ultradeep sequencing study that identifies evolutionarily conserved amino acids in the rat neurotensin receptor. The results from our analysis provide insights for the secondary structure prediction, modeling and design of membrane proteins. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Alanine and proline content modulate global sensitivity to discrete perturbations in disordered proteins
Molecular transduction of biological signals is understood primarily in terms of the cooperative structural transitions of protein macromolecules, providing a mechanism through which discrete local structure perturbations affect global macromolecular properties. The recognition that proteins lacking tertiary stability, commonly referred to as intrinsically disordered proteins (IDPs), mediate key signaling pathways suggests that protein structures without cooperative intramolecular interactions may also have the ability to couple local and global structure changes. Presented here are results from experiments that measured and tested the ability of disordered proteins to couple local changes in structure to global changes in structure. Using the intrinsically disordered N-terminal region of the p53 protein as an experimental model, a set of proline (PRO) and alanine (ALA) to glycine (GLY) substitution variants were designed to modulate backbone conformational propensities without introducing non-native intramolecular interactions. The hydrodynamic radius (Rh) was used to monitor changes in global structure. Circular dichroism spectroscopy showed that the GLY substitutions decreased polyproline II (PPII) propensities relative to the wild type, as expected, and fluorescence methods indicated that substitution-induced changes in Rh were not associated with folding. The experiments showed that changes in local PPII structure cause changes in Rh that are variable and that depend on the intrinsic chain propensities of PRO and ALA residues, demonstrating a mechanism for coupling local and global structure changes. Molecular simulations that model our results were used to extend the analysis to other proteins and illustrate the generality of the observed PRO and alanine effects on the structures of IDPs. Proteins 2014. © 2014 Wiley Periodicals, Inc.