PLoS Computational Biology

Publishing science
  • The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum
    [Mar 2013]

    by Rasmus Agren, Liming Liu, Saeed Shoaie, Wanwipa Vongsangnak, Intawat Nookaew, Jens Nielsen

    We present the RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks) Toolbox: a software suite that allows for semi-automated reconstruction of genome-scale models. It makes use of published models and/or the KEGG database, coupled with extensive gap-filling and quality control features. The software suite also contains methods for visualizing simulation results and omics data, as well as a range of methods for performing simulations and analyzing the results. The software is a useful tool for system-wide data analysis in a metabolic context and for streamlined reconstruction of metabolic networks based on protein homology. The RAVEN Toolbox workflow was applied in order to reconstruct a genome-scale metabolic model for the important microbial cell factory Penicillium chrysogenum Wisconsin54-1255. The model was validated in a bibliomic study of in total 440 references, and it comprises 1471 unique biochemical reactions and 1006 ORFs. It was then used to study the roles of ATP and NADPH in the biosynthesis of penicillin, and to identify potential metabolic engineering targets for maximization of penicillin production.
    Categories: Journal Articles
  • Analysis of Physicochemical and Structural Properties Determining HIV-1 Coreceptor Usage
    [Mar 2013]

    by Katarzyna Bozek, Thomas Lengauer, Saleta Sierra, Rolf Kaiser, Francisco S. Domingues

    The relationship of HIV tropism with disease progression and the recent development of CCR5-blocking drugs underscore the importance of monitoring virus coreceptor usage. As an alternative to costly phenotypic assays, computational methods aim at predicting virus tropism based on the sequence and structure of the V3 loop of the virus gp120 protein. Here we present a numerical descriptor of the V3 loop encoding its physicochemical and structural properties. The descriptor allows for structure-based prediction of HIV tropism and identification of properties of the V3 loop that are crucial for coreceptor usage. Use of the proposed descriptor for prediction results in a statistically significant improvement over the prediction based solely on V3 sequence with 3 percentage points improvement in AUC and 7 percentage points in sensitivity at the specificity of the 11/25 rule (95%). We additionally assessed the predictive power of the new method on clinically derived ‘bulk’ sequence data and obtained a statistically significant improvement in AUC of 3 percentage points over sequence-based prediction. Furthermore, we demonstrated the capacity of our method to predict therapy outcome by applying it to 53 samples from patients undergoing Maraviroc therapy. The analysis of structural features of the loop informative of tropism indicates the importance of two loop regions and their physicochemical properties. The regions are located on opposite strands of the loop stem and the respective features are predominantly charge-, hydrophobicity- and structure-related. These regions are in close proximity in the bound conformation of the loop potentially forming a site determinant for the coreceptor binding. The method is available via server under http://structure.bioinf.mpi-inf.mpg.de/.
    Categories: Journal Articles
  • A Discriminative Approach for Unsupervised Clustering of DNA Sequence Motifs
    [Mar 2013]

    by Philip Stegmaier, Alexander Kel, Edgar Wingender, Jürgen Borlak

    Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC) criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.
    Categories: Journal Articles
  • Neuronal Avalanches Differ from Wakefulness to Deep Sleep – Evidence from Intracranial Depth Recordings in Humans
    [Mar 2013]

    by Viola Priesemann, Mario Valderrama, Michael Wibral, Michel Le Van Quyen

    Neuronal activity differs between wakefulness and sleep states. In contrast, an attractor state, called self-organized critical (SOC), was proposed to govern brain dynamics because it allows for optimal information coding. But is the human brain SOC for each vigilance state despite the variations in neuronal dynamics? We characterized neuronal avalanches – spatiotemporal waves of enhanced activity - from dense intracranial depth recordings in humans. We showed that avalanche distributions closely follow a power law – the hallmark feature of SOC - for each vigilance state. However, avalanches clearly differ with vigilance states: slow wave sleep (SWS) shows large avalanches, wakefulness intermediate, and rapid eye movement (REM) sleep small ones. Our SOC model, together with the data, suggested first that the differences are mediated by global but tiny changes in synaptic strength, and second, that the changes with vigilance states reflect small deviations from criticality to the subcritical regime, implying that the human brain does not operate at criticality proper but close to SOC. Independent of criticality, the analysis confirms that SWS shows increased correlations between cortical areas, and reveals that REM sleep shows more fragmented cortical dynamics.
    Categories: Journal Articles
  • Inferring Metabolic States in Uncharacterized Environments Using Gene-Expression Measurements
    [Mar 2013]

    by Sergio Rossell, Martijn A. Huynen, Richard A. Notebaart

    The large size of metabolic networks entails an overwhelming multiplicity in the possible steady-state flux distributions that are compatible with stoichiometric constraints. This space of possibilities is largest in the frequent situation where the nutrients available to the cells are unknown. These two factors: network size and lack of knowledge of nutrient availability, challenge the identification of the actual metabolic state of living cells among the myriad possibilities. Here we address this challenge by developing a method that integrates gene-expression measurements with genome-scale models of metabolism as a means of inferring metabolic states. Our method explores the space of alternative flux distributions that maximize the agreement between gene expression and metabolic fluxes, and thereby identifies reactions that are likely to be active in the culture from which the gene-expression measurements were taken. These active reactions are used to build environment-specific metabolic models and to predict actual metabolic states. We applied our method to model the metabolic states of Saccharomyces cerevisiae growing in rich media supplemented with either glucose or ethanol as the main energy source. The resulting models comprise about 50% of the reactions in the original model, and predict environment-specific essential genes with high sensitivity. By minimizing the sum of fluxes while forcing our predicted active reactions to carry flux, we predicted the metabolic states of these yeast cultures that are in large agreement with what is known about yeast physiology. Most notably, our method predicts the Crabtree effect in yeast cells growing in excess glucose, a long-known phenomenon that could not have been predicted by traditional constraint-based modeling approaches. Our method is of immediate practical relevance for medical and industrial applications, such as the identification of novel drug targets, and the development of biotechnological processes that use complex, largely uncharacterized media, such as biofuel production.
    Categories: Journal Articles
  • Probabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences
    [Mar 2013]

    by Dazhi Jiao, Yuzhen Ye, Haixu Tang

    Shotgun metagenomics has been applied to the studies of the functionality of various microbial communities. As a critical analysis step in these studies, biological pathways are reconstructed based on the genes predicted from metagenomic shotgun sequences. Pathway reconstruction provides insights into the functionality of a microbial community and can be used for comparing multiple microbial communities. The utilization of pathway reconstruction, however, can be jeopardized because of imperfect functional annotation of genes, and ambiguity in the assignment of predicted enzymes to biochemical reactions (e.g., some enzymes are involved in multiple biochemical reactions). Considering that metabolic functions in a microbial community are carried out by many enzymes in a collaborative manner, we present a probabilistic sampling approach to profiling functional content in a metagenomic dataset, by sampling functions of catalytically promiscuous enzymes within the context of the entire metabolic network defined by the annotated metagenome. We test our approach on metagenomic datasets from environmental and human-associated microbial communities. The results show that our approach provides a more accurate representation of the metabolic activities encoded in a metagenome, and thus improves the comparative analysis of multiple microbial communities. In addition, our approach reports likelihood scores of putative reactions, which can be used to identify important reactions and metabolic pathways that reflect the environmental adaptation of the microbial communities. Source code for sampling metabolic networks is available online at http://omics.informatics.indiana.edu/mg/MetaNetSam/.
    Categories: Journal Articles
  • Bayesian Estimation of Mixture Models with Prespecified Elements to Compare Drug Resistance in Treatment-Naïve and Experienced Tuberculosis Cases
    [Mar 2013]

    by Alane Izu, Ted Cohen, Victor DeGruttola

    We propose a Bayesian approach for estimating branching tree mixture models to compare drug-resistance pathways (i.e. patterns of sequential acquisition of resistance to individual antibiotics) that are observed among Mycobacterium tuberculosis isolates collected from treatment-naïve and treatment-experienced patients. Resistant pathogens collected from treatment-naïve patients are strains for which fitness costs of resistance were not sufficient to prevent transmission, whereas those collected from treatment-experienced patients reflect both transmitted and acquired resistance, the latter of which may or may not be associated with lower transmissibility. The comparison of the resistance pathways constructed from these two groups of drug-resistant strains provides insight into which pathways preferentially lead to the development of multiple drug resistant strains that are transmissible. We apply the proposed statistical methods to data from worldwide surveillance of drug-resistant tuberculosis collected by the World Health Organization over 13 years.
    Categories: Journal Articles
  • Angiogenesis: An Adaptive Dynamic Biological Patterning Problem
    [Mar 2013]

    by Timothy W. Secomb, Jonathan P. Alberding, Richard Hsu, Mark W. Dewhirst, Axel R. Pries

    Formation of functionally adequate vascular networks by angiogenesis presents a problem in biological patterning. Generated without predetermined spatial patterns, networks must develop hierarchical tree-like structures for efficient convective transport over large distances, combined with dense space-filling meshes for short diffusion distances to every point in the tissue. Moreover, networks must be capable of restructuring in response to changing functional demands without interruption of blood flow. Here, theoretical simulations based on experimental data are used to demonstrate that this patterning problem can be solved through over-abundant stochastic generation of vessels in response to a growth factor generated in hypoxic tissue regions, in parallel with refinement by structural adaptation and pruning. Essential biological mechanisms for generation of adequate and efficient vascular patterns are identified and impairments in vascular properties resulting from defects in these mechanisms are predicted. The results provide a framework for understanding vascular network formation in normal or pathological conditions and for predicting effects of therapies targeting angiogenesis.
    Categories: Journal Articles
  • Systematic Prediction of Pharmacodynamic Drug-Drug Interactions through Protein-Protein-Interaction Network
    [Mar 2013]

    by Jialiang Huang, Chaoqun Niu, Christopher D. Green, Lun Yang, Hongkang Mei, Jing-Dong J. Han

    Identifying drug-drug interactions (DDIs) is a major challenge in drug development. Previous attempts have established formal approaches for pharmacokinetic (PK) DDIs, but there is not a feasible solution for pharmacodynamic (PD) DDIs because the endpoint is often a serious adverse event rather than a measurable change in drug concentration. Here, we developed a metric “S-score” that measures the strength of network connection between drug targets to predict PD DDIs. Utilizing known PD DDIs as golden standard positives (GSPs), we observed a significant correlation between S-score and the likelihood a PD DDI occurs. Our prediction was robust and surpassed existing methods as validated by two independent GSPs. Analysis of clinical side effect data suggested that the drugs having predicted DDIs have similar side effects. We further incorporated this clinical side effects evidence with S-score to increase the prediction specificity and sensitivity through a Bayesian probabilistic model. We have predicted 9,626 potential PD DDIs at the accuracy of 82% and the recall of 62%. Importantly, our algorithm provided opportunities for better understanding the potential molecular mechanisms or physiological effects underlying DDIs, as illustrated by the case studies.
    Categories: Journal Articles
  • A Multi-scale Analysis of Influenza A Virus Fitness Trade-offs due to Temperature-dependent Virus Persistence
    [Mar 2013]

    by Andreas Handel, Justin Brown, David Stallknecht, Pejman Rohani

    Successful replication within an infected host and successful transmission between hosts are key to the continued spread of most pathogens. Competing selection pressures exerted at these different scales can lead to evolutionary trade-offs between the determinants of fitness within and between hosts. Here, we examine such a trade-off in the context of influenza A viruses and the differential pressures exerted by temperature-dependent virus persistence. For a panel of avian influenza A virus strains, we find evidence for a trade-off between the persistence at high versus low temperatures. Combining a within-host model of influenza infection dynamics with a between-host transmission model, we study how such a trade-off affects virus fitness on the host population level. We show that conclusions regarding overall fitness are affected by the type of link assumed between the within- and between-host levels and the main route of transmission (direct or environmental). The relative importance of virulence and immune response mediated virus clearance are also found to influence the fitness impacts of virus persistence at low versus high temperatures. Based on our results, we predict that if transmission occurs mainly directly and scales linearly with virus load, and virulence or immune responses are negligible, the evolutionary pressure for influenza viruses to evolve toward good persistence at high within-host temperatures dominates. For all other scenarios, influenza viruses with good environmental persistence at low temperatures seem to be favored.
    Categories: Journal Articles
  • In-silico Assessment of Protein-Protein Electron Transfer. A Case Study: Cytochrome c Peroxidase – Cytochrome c
    [Mar 2013]

    by Frank H. Wallrapp, Alexander A. Voityuk, Victor Guallar

    The fast development of software and hardware is notably helping in closing the gap between macroscopic and microscopic data. Using a novel theoretical strategy combining molecular dynamics simulations, conformational clustering, ab-initio quantum mechanics and electronic coupling calculations, we show how computational methodologies are mature enough to provide accurate atomistic details into the mechanism of electron transfer (ET) processes in complex protein systems, known to be a significant challenge. We performed a quantitative study of the ET between Cytochrome c Peroxidase and its redox partner Cytochrome c. Our results confirm the ET mechanism as hole transfer (HT) through residues Ala194, Ala193, Gly192 and Trp191 of CcP. Furthermore, our findings indicate the fine evolution of the enzyme to approach an elevated turnover rate of 5.47×106 s−1 for the ET between Cytc and CcP through establishment of a localized bridge state in Trp191.
    Categories: Journal Articles
  • Folding Pathways of a Knotted Protein with a Realistic Atomistic Force Field
    [Mar 2013]

    by Silvio a Beccara, Tatjana Škrbić, Roberto Covino, Cristian Micheletti, Pietro Faccioli

    We report on atomistic simulation of the folding of a natively-knotted protein, MJ0366, based on a realistic force field. To the best of our knowledge this is the first reported effort where a realistic force field is used to investigate the folding pathways of a protein with complex native topology. By using the dominant-reaction pathway scheme we collected about 30 successful folding trajectories for the 82-amino acid long trefoil-knotted protein. Despite the dissimilarity of their initial unfolded configuration, these trajectories reach the natively-knotted state through a remarkably similar succession of steps. In particular it is found that knotting occurs essentially through a threading mechanism, involving the passage of the C-terminal through an open region created by the formation of the native -sheet at an earlier stage. The dominance of the knotting by threading mechanism is not observed in MJ0366 folding simulations using simplified, native-centric models. This points to a previously underappreciated role of concerted amino acid interactions, including non-native ones, in aiding the appropriate order of contact formation to achieve knotting.
    Categories: Journal Articles
  • Rampant Exchange of the Structure and Function of Extramembrane Domains between Membrane and Water Soluble Proteins
    [Mar 2013]

    by Hyun-Jun Nam, Seong Kyu Han, James U. Bowie, Sanguk Kim

    Of the membrane proteins of known structure, we found that a remarkable 67% of the water soluble domains are structurally similar to water soluble proteins of known structure. Moreover, 41% of known water soluble protein structures share a domain with an already known membrane protein structure. We also found that functional residues are frequently conserved between extramembrane domains of membrane and soluble proteins that share structural similarity. These results suggest membrane and soluble proteins readily exchange domains and their attendant functionalities. The exchanges between membrane and soluble proteins are particularly frequent in eukaryotes, indicating that this is an important mechanism for increasing functional complexity. The high level of structural overlap between the two classes of proteins provides an opportunity to employ the extensive information on soluble proteins to illuminate membrane protein structure and function, for which much less is known. To this end, we employed structure guided sequence alignment to elucidate the functions of membrane proteins in the human genome. Our results bridge the gap of fold space between membrane and water soluble proteins and provide a resource for the prediction of membrane protein function. A database of predicted structural and functional relationships for proteins in the human genome is provided at sbi.postech.ac.kr/emdmp.
    Categories: Journal Articles
  • Under-Dominance Constrains the Evolution of Negative Autoregulation in Diploids
    [Mar 2013]

    by Alexander J. Stewart, Robert M. Seymour, Andrew Pomiankowski, Max Reuter

    Regulatory networks have evolved to allow gene expression to rapidly track changes in the environment as well as to buffer perturbations and maintain cellular homeostasis in the absence of change. Theoretical work and empirical investigation in Escherichia coli have shown that negative autoregulation confers both rapid response times and reduced intrinsic noise, which is reflected in the fact that almost half of Escherichia coli transcription factors are negatively autoregulated. However, negative autoregulation is rare amongst the transcription factors of Saccharomyces cerevisiae. This difference is surprising because E. coli and S. cerevisiae otherwise have similar profiles of network motifs. In this study we investigate regulatory interactions amongst the transcription factors of Drosophila melanogaster and humans, and show that they have a similar dearth of negative autoregulation to that seen in S. cerevisiae. We then present a model demonstrating that this stiking difference in the noise reduction strategies used amongst species can be explained by constraints on the evolution of negative autoregulation in diploids. We show that regulatory interactions between pairs of homologous genes within the same cell can lead to under-dominance — mutations which result in stronger autoregulation, and decrease noise in homozygotes, paradoxically can cause increased noise in heterozygotes. This severely limits a diploid's ability to evolve negative autoregulation as a noise reduction mechanism. Our work offers a simple and general explanation for a previously unexplained difference between the regulatory architectures of E. coli and yeast, Drosophila and humans. It also demonstrates that the effects of diploidy in gene networks can have counter-intuitive consequences that may profoundly influence the course of evolution.
    Categories: Journal Articles
  • Multi-scale Inference of Interaction Rules in Animal Groups Using Bayesian Model Selection
    [Mar 2013]

    by Richard P. Mann, Andrea Perna, Daniel Strömbom, Roman Garnett, James E. Herbert-Read, David J. T. Sumpter, Ashley J. W. Ward

    Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis). We show that these exhibit a stereotypical ‘phase transition’, whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have ‘memory’ of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture the observed locality of interactions. Traditional self-propelled particle models fail to capture the fine scale dynamics of the system. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics, while maintaining a biologically plausible perceptual range. We conclude that prawns’ movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects.
    Categories: Journal Articles
  • Viral Phylodynamics
    [Mar 2013]

    by Erik M. Volz, Katia Koelle, Trevor Bedford

    Viral phylodynamics is defined as the study of how epidemiological, immunological, and evolutionary processes act and potentially interact to shape viral phylogenies. Since the coining of the term in 2004, research on viral phylodynamics has focused on transmission dynamics in an effort to shed light on how these dynamics impact viral genetic variation. Transmission dynamics can be considered at the level of cells within an infected host, individual hosts within a population, or entire populations of hosts. Many viruses, especially RNA viruses, rapidly accumulate genetic variation because of short generation times and high mutation rates. Patterns of viral genetic variation are therefore heavily influenced by how quickly transmission occurs and by which entities transmit to one another. Patterns of viral genetic variation will also be affected by selection acting on viral phenotypes. Although viruses can differ with respect to many phenotypes, phylodynamic studies have to date tended to focus on a limited number of viral phenotypes. These include virulence phenotypes, phenotypes associated with viral transmissibility, cell or tissue tropism phenotypes, and antigenic phenotypes that can facilitate escape from host immunity. Due to the impact that transmission dynamics and selection can have on viral genetic variation, viral phylogenies can therefore be used to investigate important epidemiological, immunological, and evolutionary processes, such as epidemic spread [2], spatio-temporal dynamics including metapopulation dynamics [3], zoonotic transmission, tissue tropism [4], and antigenic drift [5]. The quantitative investigation of these processes through the consideration of viral phylogenies is the central aim of viral phylodynamics.
    Categories: Journal Articles
  • Computational Predictions Provide Insights into the Biology of TAL Effector Target Sites
    [Mar 2013]

    by Jan Grau, Annett Wolf, Maik Reschke, Ulla Bonas, Stefan Posch, Jens Boch

    Transcription activator-like (TAL) effectors are injected into host plant cells by Xanthomonas bacteria to function as transcriptional activators for the benefit of the pathogen. The DNA binding domain of TAL effectors is composed of conserved amino acid repeat structures containing repeat-variable diresidues (RVDs) that determine DNA binding specificity. In this paper, we present TALgetter, a new approach for predicting TAL effector target sites based on a statistical model. In contrast to previous approaches, the parameters of TALgetter are estimated from training data computationally. We demonstrate that TALgetter successfully predicts known TAL effector target sites and often yields a greater number of predictions that are consistent with up-regulation in gene expression microarrays than an existing approach, Target Finder of the TALE-NT suite. We study the binding specificities estimated by TALgetter and approve that different RVDs are differently important for transcriptional activation. In subsequent studies, the predictions of TALgetter indicate a previously unreported positional preference of TAL effector target sites relative to the transcription start site. In addition, several TAL effectors are predicted to bind to the TATA-box, which might constitute one general mode of transcriptional activation by TAL effectors. Scrutinizing the predicted target sites of TALgetter, we propose several novel TAL effector virulence targets in rice and sweet orange. TAL-mediated induction of the candidates is supported by gene expression microarrays. Validity of these targets is also supported by functional analogy to known TAL effector targets, by an over-representation of TAL effector targets with similar function, or by a biological function related to pathogen infection. Hence, these predicted TAL effector virulence targets are promising candidates for studying the virulence function of TAL effectors. TALgetter is implemented as part of the open-source Java library Jstacs, and is freely available as a web-application and a command line program.
    Categories: Journal Articles
  • Revealing a Two-Loop Transcriptional Feedback Mechanism in the Cyanobacterial Circadian Clock
    [Mar 2013]

    by Stefanie Hertel, Christian Brettschneider, Ilka M. Axmann

    Molecular genetic studies in the circadian model organism Synechococcus have revealed that the KaiC protein, the central component of the circadian clock in cyanobacteria, is involved in activation and repression of its own gene transcription. During 24 hours, KaiC hexamers run through different phospho-states during daytime. So far, it has remained unclear which phospho-state of KaiC promotes kaiBC expression and which opposes transcriptional activation. We systematically analyzed various combinations of positive and negative transcriptional feedback regulation by introducing a combined TTFL/PTO model consisting of our previous post-translational oscillator that considers all four phospho-states of KaiC and a transcriptional/translational feedback loop. Only a particular two-loop feedback mechanism out of 32 we have extensively tested is able to reproduce existing experimental observations, including the effects of knockout or overexpression of kai genes. Here, threonine and double phosphorylated KaiC hexamers activate and unphosphorylated KaiC hexamers suppress kaiBC transcription. Our model simulations suggest that the peak expression ratio of the positive and the negative component of kaiBC expression is the main factor for how the different two-loop feedback models respond to removal or to overexpression of kai genes. We discuss parallels between our proposed TTFL/PTO model and two-loop feedback structures found in the mammalian clock.
    Categories: Journal Articles
  • Differential Expression Analysis for Pathways
    [Mar 2013]

    by Winston A. Haynes, Roger Higdon, Larissa Stanberry, Dwayne Collins, Eugene Kolker

    Life science technologies generate a deluge of data that hold the keys to unlocking the secrets of important biological functions and disease mechanisms. We present DEAP, Differential Expression Analysis for Pathways, which capitalizes on information about biological pathways to identify important regulatory patterns from differential expression data. DEAP makes significant improvements over existing approaches by including information about pathway structure and discovering the most differentially expressed portion of the pathway. On simulated data, DEAP significantly outperformed traditional methods: with high differential expression, DEAP increased power by two orders of magnitude; with very low differential expression, DEAP doubled the power. DEAP performance was illustrated on two different gene and protein expression studies. DEAP discovered fourteen important pathways related to chronic obstructive pulmonary disease and interferon treatment that existing approaches omitted. On the interferon study, DEAP guided focus towards a four protein path within the 26 protein Notch signalling pathway.
    Categories: Journal Articles
  • RFECS: A Random-Forest Based Algorithm for Enhancer Identification from Chromatin State
    [Mar 2013]

    by Nisha Rajagopal, Wei Xie, Yan Li, Uli Wagner, Wei Wang, John Stamatoyannopoulos, Jason Ernst, Manolis Kellis, Bing Ren

    Transcriptional enhancers play critical roles in regulation of gene expression, but their identification in the eukaryotic genome has been challenging. Recently, it was shown that enhancers in the mammalian genome are associated with characteristic histone modification patterns, which have been increasingly exploited for enhancer identification. However, only a limited number of cell types or chromatin marks have previously been investigated for this purpose, leaving the question unanswered whether there exists an optimal set of histone modifications for enhancer prediction in different cell types. Here, we address this issue by exploring genome-wide profiles of 24 histone modifications in two distinct human cell types, embryonic stem cells and lung fibroblasts. We developed a Random-Forest based algorithm, RFECS (Random Forest based Enhancer identification from Chromatin States) to integrate histone modification profiles for identification of enhancers, and used it to identify enhancers in a number of cell-types. We show that RFECS not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify the most informative and robust set of three chromatin marks for enhancer prediction.
    Categories: Journal Articles