Chapter 15: Disease Gene Prioritization
by Yana BrombergDisease-causing aberrations in the normal function of a gene define that gene as a disease gene. Proving a causal link between a gene and a disease experimentally is expensive and time-consuming. Comprehensive prioritization of candidate genes prior to experimental testing drastically reduces the associated costs. Computational gene prioritization is based on various pieces of correlative evidence that associate each gene with the given disease and suggest possible causal links. A fair amount of this evidence comes from high-throughput experimentation. Thus, well-developed methods are necessary to reliably deal with the quantity of information at hand. Existing gene prioritization techniques already significantly improve the outcomes of targeted experimental studies. Faster and more reliable techniques that account for novel data types are necessary for the development of new diagnostics, treatments, and cure for many diseases.
The Interaction of Vinculin with Actin
by Javad Golji, Mohammad R. K. MofradVinculin can interact with F-actin both in recruitment of actin filaments to the growing focal adhesions and also in capping of actin filaments to regulate actin dynamics. Using molecular dynamics, both interactions are simulated using different vinculin conformations. Vinculin is simulated either with only its vinculin tail domain (Vt), with all residues in its closed conformation, with all residues in an open I conformation, and with all residues in an open II conformation. The open I conformation results from movement of domain 1 away from Vt; the open II conformation results from complete dissociation of Vt from the vinculin head domains. Simulation of vinculin binding along the actin filament showed that Vt alone can bind along the actin filaments, that vinculin in its closed conformation cannot bind along the actin filaments, and that vinculin in its open I conformation can bind along the actin filaments. The simulations confirm that movement of domain 1 away from Vt in formation of vinculin 1 is sufficient for allowing Vt to bind along the actin filament. Simulation of Vt capping actin filaments probe six possible bound structures and suggest that vinculin would cap actin filaments by interacting with both S1 and S3 of the barbed-end, using the surface of Vt normally occluded by D4 and nearby vinculin head domain residues. Simulation of D4 separation from Vt after D1 separation formed the open II conformation. Binding of open II vinculin to the barbed-end suggests this conformation allows for vinculin capping. Three binding sites on F-actin are suggested as regions that could link to vinculin. Vinculin is suggested to function as a variable switch at the focal adhesions. The conformation of vinculin and the precise F-actin binding conformation is dependent on the level of mechanical load on the focal adhesion.
Distinct Types of Disorder in the Human Proteome: Functional Implications for Alternative Splicing
by Recep Colak, TaeHyung Kim, Magali Michaut, Mark Sun, Manuel Irimia, Jeremy Bellay, Chad L. Myers, Benjamin J. Blencowe, Philip M. KimIntrinsically disordered regions have been associated with various cellular processes and are implicated in several human diseases, but their exact roles remain unclear. We previously defined two classes of conserved disordered regions in budding yeast, referred to as “flexible” and “constrained” conserved disorder. In flexible disorder, the property of disorder has been positionally conserved during evolution, whereas in constrained disorder, both the amino acid sequence and the property of disorder have been conserved. Here, we show that flexible and constrained disorder are widespread in the human proteome, and are particularly common in proteins with regulatory functions. Both classes of disordered sequences are highly enriched in regions of proteins that undergo tissue-specific (TS) alternative splicing (AS), but not in regions of proteins that undergo general (i.e., not tissue-regulated) AS. Flexible disorder is more highly enriched in TS alternative exons, whereas constrained disorder is more highly enriched in exons that flank TS alternative exons. These latter regions are also significantly more enriched in potential phosphosites and other short linear motifs associated with cell signaling. We further show that cancer driver mutations are significantly enriched in regions of proteins associated with TS and general AS. Collectively, our results point to distinct roles for TS alternative exons and flanking exons in the dynamic regulation of protein interaction networks in response to signaling activity, and they further suggest that alternatively spliced regions of proteins are often functionally altered by mutations responsible for cancer.
by Fuhai Li, Zheng Yin, Guangxu Jin, Hong Zhao, Stephen T. C. WongRecent advances in automated high-resolution fluorescence microscopy and robotic handling have made the systematic and cost effective study of diverse morphological changes within a large population of cells possible under a variety of perturbations, e.g., drugs, compounds, metal catalysts, RNA interference (RNAi). Cell population-based studies deviate from conventional microscopy studies on a few cells, and could provide stronger statistical power for drawing experimental observations and conclusions. However, it is challenging to manually extract and quantify phenotypic changes from the large amounts of complex image data generated. Thus, bioimage informatics approaches are needed to rapidly and objectively quantify and analyze the image data. This paper provides an overview of the bioimage informatics challenges and approaches in image-based studies for drug and target discovery. The concepts and capabilities of image-based screening are first illustrated by a few practical examples investigating different kinds of phenotypic changes caEditorsused by drugs, compounds, or RNAi. The bioimage analysis approaches, including object detection, segmentation, and tracking, are then described. Subsequently, the quantitative features, phenotype identification, and multidimensional profile analysis for profiling the effects of drugs and targets are summarized. Moreover, a number of publicly available software packages for bioimage informatics are listed for further reference. It is expected that this review will help readers, including those without bioimage informatics expertise, understand the capabilities, approaches, and tools of bioimage informatics and apply them to advance their own studies.
Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity
by Bernhard Nessler, Michael Pfeiffer, Lars Buesing, Wolfgang MaassThe principles by which networks of neurons compute, and how spike-timing dependent plasticity (STDP) of synaptic weights generates and maintains their computational function, are unknown. Preceding work has shown that soft winner-take-all (WTA) circuits, where pyramidal neurons inhibit each other via interneurons, are a common motif of cortical microcircuits. We show through theoretical analysis and computer simulations that Bayesian computation is induced in these network motifs through STDP in combination with activity-dependent changes in the excitability of neurons. The fundamental components of this emergent Bayesian computation are priors that result from adaptation of neuronal excitability and implicit generative models for hidden causes that are created in the synaptic weights through STDP. In fact, a surprising result is that STDP is able to approximate a powerful principle for fitting such implicit generative models to high-dimensional spike inputs: Expectation Maximization. Our results suggest that the experimentally observed spontaneous activity and trial-to-trial variability of cortical neurons are essential features of their information processing capability, since their functional role is to represent probability distributions rather than static neural codes. Furthermore it suggests networks of Bayesian computation modules as a new model for distributed information processing in the cortex.
by K. Bretonnel Cohen, Lawrence E. HunterText mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research—translating basic science results into new interventions—and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
by Jordan R. Willis, Bryan S. Briney, Samuel L. DeLuca, James E. Crowe, Jens MeilerStructural flexibility in germline gene-encoded antibodies allows promiscuous binding to diverse antigens. The binding affinity and specificity for a particular epitope typically increase as antibody genes acquire somatic mutations in antigen-stimulated B cells. In this work, we investigated whether germline gene-encoded antibodies are optimal for polyspecificity by determining the basis for recognition of diverse antigens by antibodies encoded by three VH gene segments. Panels of somatically mutated antibodies encoded by a common VH gene, but each binding to a different antigen, were computationally redesigned to predict antibodies that could engage multiple antigens at once. The Rosetta multi-state design process predicted antibody sequences for the entire heavy chain variable region, including framework, CDR1, and CDR2 mutations. The predicted sequences matched the germline gene sequences to a remarkable degree, revealing by computational design the residues that are predicted to enable polyspecificity, i.e., binding of many unrelated antigens with a common sequence. The process thereby reverses antibody maturation in silico. In contrast, when designing antibodies to bind a single antigen, a sequence similar to that of the mature antibody sequence was returned, mimicking natural antibody maturation in silico. We demonstrated that the Rosetta computational design algorithm captures important aspects of antibody/antigen recognition. While the hypervariable region CDR3 often mediates much of the specificity of mature antibodies, we identified key positions in the VH gene encoding CDR1, CDR2, and the immunoglobulin framework that are critical contributors for polyspecificity in germline antibodies. Computational design of antibodies capable of binding multiple antigens may allow the rational design of antibodies that retain polyspecificity for diverse epitope binding.
Localization of Protein Aggregation in Escherichia coli Is Governed by Diffusion and Nucleoid Macromolecular Crowding Effect
by Anne-Sophie Coquel, Jean-Pascal Jacob, Mael Primet, Alice Demarez, Mariella Dimiccoli, Thomas Julou, Lionel Moisan, Ariel B. Lindner, Hugues BerryAggregates of misfolded proteins are a hallmark of many age-related diseases. Recently, they have been linked to aging of Escherichia coli (E. coli) where protein aggregates accumulate at the old pole region of the aging bacterium. Because of the potential of E. coli as a model organism, elucidating aging and protein aggregation in this bacterium may pave the way to significant advances in our global understanding of aging. A first obstacle along this path is to decipher the mechanisms by which protein aggregates are targeted to specific intercellular locations. Here, using an integrated approach based on individual-based modeling, time-lapse fluorescence microscopy and automated image analysis, we show that the movement of aging-related protein aggregates in E. coli is purely diffusive (Brownian). Using single-particle tracking of protein aggregates in live E. coli cells, we estimated the average size and diffusion constant of the aggregates. Our results provide evidence that the aggregates passively diffuse within the cell, with diffusion constants that depend on their size in agreement with the Stokes-Einstein law. However, the aggregate displacements along the cell long axis are confined to a region that roughly corresponds to the nucleoid-free space in the cell pole, thus confirming the importance of increased macromolecular crowding in the nucleoids. We thus used 3D individual-based modeling to show that these three ingredients (diffusion, aggregation and diffusion hindrance in the nucleoids) are sufficient and necessary to reproduce the available experimental data on aggregate localization in the cells. Taken together, our results strongly support the hypothesis that the localization of aging-related protein aggregates in the poles of E. coli results from the coupling of passive diffusion-aggregation with spatially non-homogeneous macromolecular crowding. They further support the importance of “soft” intracellular structuring (based on macromolecular crowding) in diffusion-based protein localization in E. coli.