Journal Articles

  • A Nickel Phosphine Complex as a Fast and Efficient Hydrogen Production Catalyst
    [Jan 2015]

    Journal of the American Chemical SocietyDOI: 10.1021/ja509779q
    Categories: Journal Articles
  • SubPatCNV: approximate subspace pattern mining for mapping copy-number variations
    [Jan 2015]

    Background: Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as ?Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals??. Results: We introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline CNVs from four populations of HapMap samples. In experiments on the TCGA ovarian cancer dataset, SubPatCNV discovered many large aberrant CNV events in patient subgroups, and reported regions enriched with cancer relevant genes. In both HapMap data and TCGA data, it was observed that SubPatCNV employs approximate pattern mining to more effectively identify CNV subspace patterns that are consistent within a subgroup from high-density array data. Conclusions: SubPatCNV available through http://sourceforge.net/projects/subpatcnv/is a unique scalable open-source software tool that provides the flexibility of identifying CNV regions specific to sample subgroups of different sizes from high-density CNV array data.
    Categories: Journal Articles
  • Semi-supervised adaptive-height snipping of the hierarchical clustering tree
    [Jan 2015]

    Background: In genomics, hierarchical clustering (HC) is a popular method for grouping similar samples based on a distance measure. HC algorithms do not actually create clusters, but compute a hierarchical representation of the data set. Usually, a fixed height on the HC tree is used, and each contiguous branch of samples below that height is considered a separate cluster. Due to the fixed-height cutting, those clusters may not unravel significant functional coherence hidden deeper in the tree. Besides that, most existing approaches do not make use of available clinical information to guide cluster extraction from the HC. Thus, the identified subgroups may be difficult to interpret in relation to that information. Results: We develop a novel framework for decomposing the HC tree into clusters by semi-supervised piecewise snipping. The framework, called guided piecewise snipping, utilizes both molecular data and clinical information to decompose the HC tree into clusters. It cuts the given HC tree at variable heights to find a partition (a set of non-overlapping clusters) which does not only represent a structure deemed to underlie the data from which HC tree is derived, but is also maximally consistent with the supplied clinical data. Moreover, the approach does not require the user to specify the number of clusters prior to the analysis. Extensive results on simulated and multiple medical data sets show that our approach consistently produces more meaningful clusters than the standard fixed-height cut and/or non-guided approaches. Conclusions: The guided piecewise snipping approach features several novelties and advantages over existing approaches. The proposed algorithm is generic, and can be combined with other algorithms that operate on detected clusters. This approach represents an advancement in several regards: (1) a piecewise tree snipping framework that efficiently extracts clusters by snipping the HC tree possibly at variable heights while preserving the HC tree structure; (2) a flexible implementation allowing a variety of data types for both building and snipping the HC tree, including patient follow-up data like survival as auxiliary information.The data sets and R code are provided as supplementary files. The proposed method is available from Bioconductor as the R-package HCsnip.
    Categories: Journal Articles
  • Decomposing the space of protein quaternary structures with the interface fragment pair library
    [Jan 2015]

    Background: The physical interactions between proteins constitute the basis of protein quaternary structures. They dominate many biological processes in living cells. Deciphering the structural features of interacting proteins is essential to understand their cellular functions. Similar to the space of protein tertiary structures in which discrete patterns are clearly observed on fold or sub-fold motif levels, it has been found that the space of protein quaternary structures is highly degenerate due to the packing of compact secondary structure elements at interfaces. Therefore, it is necessary to further decompose the protein quaternary structural space into a more local representation. Results: Here we constructed an interface fragment pair library from the current structure database of protein complexes. After structural-based clustering, we found that more than 90% of these interface fragment pairs can be represented by a limited number of highly abundant motifs. These motifs were further used to guide complex assembly. A large-scale benchmark test shows that the native-like binding is highly likely in the structural ensemble of modeled protein complexes that were built through the library. Conclusions: Our study therefore presents supportive evidences that the space of protein quaternary structures can be represented by the combination of a small set of secondary-structure-based packing at binding interfaces. Finally, after future improvements such as adding sequence profiles, we expect this new library will be useful to predict structures of unknown protein-protein interactions.
    Categories: Journal Articles
  • Rational selection of experimental readout and intervention sites for reducing uncertainties in computational model predictions
    [Jan 2015]

    Background: Understanding the dynamics of biological processes can substantially be supported by computational models in the form of nonlinear ordinary differential equations (ODE). Typically, this model class contains many unknown parameters, which are estimated from inadequate and noisy data. Depending on the ODE structure, predictions based on unmeasured states and associated parameters are highly uncertain, even undetermined. For given data, profile likelihood analysis has been proven to be one of the most practically relevant approaches for analyzing the identifiability of an ODE structure, and thus model predictions. In case of highly uncertain or non-identifiable parameters, rational experimental design based on various approaches has shown to significantly reduce parameter uncertainties with minimal amount of effort. Results: In this work we illustrate how to use profile likelihood samples for quantifying the individual contribution of parameter uncertainty to prediction uncertainty. For the uncertainty quantification we introduce the profile likelihood sensitivity (PLS) index. Additionally, for the case of several uncertain parameters, we introduce the PLS entropy to quantify individual contributions to the overall prediction uncertainty. We show how to use these two criteria as an experimental design objective for selecting new, informative readouts in combination with intervention site identification. The characteristics of the proposed multi-criterion objective are illustrated with an in silico example. We further illustrate how an existing practically non-identifiable model for the chlorophyll fluorescence induction in a photosynthetic organism, D. salina, can be rendered identifiable by additional experiments with new readouts. Conclusions: Having data and profile likelihood samples at hand, the here proposed uncertainty quantification based on prediction samples from the profile likelihood provides a simple way for determining individual contributions of parameter uncertainties to uncertainties in model predictions. The uncertainty quantification of specific model predictions allows identifying regions, where model predictions have to be considered with care. Such uncertain regions can be used for a rational experimental design to render initially highly uncertain model predictions into certainty. Finally, our uncertainty quantification directly accounts for parameter interdependencies and parameter sensitivities of the specific prediction.
    Categories: Journal Articles
  • Amyloid precursor protein interaction network in human testis: sentinel proteins for male reproduction
    [Jan 2015]

    Background: Amyloid precursor protein (APP) is widely recognized for playing a central role in Alzheimer's disease pathogenesis. Although APP is expressed in several tissues outside the human central nervous system, the functions of APP and its family members in other tissues are still poorly understood. APP is involved in several biological functions which might be potentially important for male fertility, such as cell adhesion, cell motility, signaling, and apoptosis. Furthermore, APP superfamily members are known to be associated with fertility. Knowledge on the protein networks of APP in human testis and spermatozoa will shed light on the function of APP in the male reproductive system. Results: We performed a Yeast Two-Hybrid screen and a database search to study the interaction network of APP in human testis and sperm. To gain insights into the role of APP superfamily members in fertility, the study was extended to APP-like protein 2 (APLP2). We analyzed several topological properties of the APP interaction network and the biological and physiological properties of the proteins in the APP interaction network were also specified by gene ontologyand pathways analyses. We classified significant features related to the human male reproduction for the APP interacting proteins and identified modules of proteins with similar functional roles which may show cooperative behavior for male fertility. Conclusions: The present work provides the first report on the APP interactome in human testis. Our approach allowed the identification of novel interactions and recognition of key APP interacting proteins for male reproduction, particularly in sperm-oocyte interaction.
    Categories: Journal Articles
  • Nonparametric Bayesian clustering to detect bipolar methylated genomic loci
    [Jan 2015]

    Background: With recent development in sequencing technology, a large number of genome-wide DNA methylation studies have generated massive amounts of bisulfite sequencing data. The analysis of DNA methylation patterns helps researchers understand epigenetic regulatory mechanisms. Highly variable methylation patterns reflect stochastic fluctuations in DNA methylation, whereas well-structured methylation patterns imply deterministic methylation events. Among these methylation patterns, bipolar patterns are important as they may originate from allele-specific methylation (ASM) or cell-specific methylation (CSM). Results: Utilizing nonparametric Bayesian clustering followed by hypothesis testing, we have developed a novel statistical approach to identify bipolar methylated genomic regions in bisulfite sequencing data. Simulation studies demonstrate that the proposed method achieves good performance in terms of specificity and sensitivity. We used the method to analyze data from mouse brain and human blood methylomes. The bipolar methylated segments detected are found highly consistent with the differentially methylated regions identified by using purified cell subsets. Conclusions: Bipolar DNA methylation often indicates epigenetic heterogeneity caused by ASM or CSM. With allele-specific events filtered out or appropriately taken into account, our proposed approach sheds light on the identification of cell-specific genes/pathways under strong epigenetic control in a heterogeneous cell population.
    Categories: Journal Articles
  • YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia
    [Jan 2015]

    Background: Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causing plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity.DescriptionTo facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the preliminary results showed differences in virulence genes found in Yersinia pestis and Yersinia pseudotuberculosis compared to other Yersinia species, and differences between Yersinia enterocolitica subsp. enterocolitica and Yersinia enterocolitica subsp. palearctica. Conclusions: YersiniaBase offers free access to wide range of genomic data and analysis tools for the analysis of Yersinia. YersiniaBase can be access at http://yersinia.um.edu.my.
    Categories: Journal Articles
  • Single-molecule dataset (SMD): a generalized storage format for raw and processed single-molecule data
    [Jan 2015]

    Background: Single-molecule techniques have emerged as incisive approaches for addressing a wide range of questions arising in contemporary biological research [ 1-4]. The analysis and interpretation of raw single-molecule data benefits greatly from the ongoing development of sophisticated statistical analysis tools that enable accurate inference at the low signal-to-noise ratios frequently associated with these measurements. While a number of groups have released analysis toolkits as open source software [5-14], it remains difficult to compare analysis for experiments performed in different labs due to a lack of standardization. Results: Here we propose a standardized single-molecule dataset (SMD) file format. SMD is designed to accommodate a wide variety of computer programming languages, single-molecule techniques, and analysis strategies. To facilitate adoption of this format we have made two existing data analysis packages that are used for single-molecule analysis compatible with this format. Conclusion: Adoption of a common, standard data file format for sharing raw single-molecule data and analysis outcomes is a critical step for the emerging and powerful single-molecule field, which will benefit both sophisticated users and non-specialists by allowing standardized, transparent, and reproducible analysis practices.
    Categories: Journal Articles
  • Fast and robust group-wise eQTL mapping using sparse graphical models
    [Jan 2015]

    Background: Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to hidden biological pathways. Results: We introduce a new approach to identify novel group-wise associations between sets of SNPs and sets of genes. Such associations are captured by hidden variables connecting SNPs and genes. Our model is a linear-Gaussian model and uses two types of hidden variables. One captures the set associations between SNPs and genes, and the other captures confounders. We develop an efficient optimization procedure which makes this approach suitable for large scale studies. Extensive experimental evaluations on both simulated and real datasets demonstrate that the proposed methods can effectively capture both individual and group-wise signals that cannot be identified by the state-of-the-art eQTL mapping methods. Conclusions: Considering group-wise associations significantly improves the accuracy of eQTL mapping, and the successful multi-layer regression model opens a new approach to understand how multiple SNPs interact with each other to jointly affect the expression level of a group of genes.
    Categories: Journal Articles
  • Predicting protein functions using incomplete hierarchical labels
    [Jan 2015]

    Background: Protein function prediction is to assign biological or biochemical functions to proteins, and it is a challenging computational problem characterized by several factors: (1) the number of function labels (annotations) is large; (2) a protein may be associated with multiple labels; (3) the function labels are structured in a hierarchy; and (4) the labels are incomplete. Current predictive models often assume that the labels of the labeled proteins are complete, i.e. no label is missing. But in real scenarios, we may be aware of only some hierarchical labels of a protein, and we may not know whether additional ones are actually present. The scenario of incomplete hierarchical labels, a challenging and practical problem, is seldom studied in protein function prediction. Results: In this paper, we propose an algorithm to Predict protein functions using Incomplete hierarchical LabeLs (PILL in short). PILL takes into account the hierarchical and the flat taxonomy similarity between function labels, and defines a Combined Similarity (ComSim) to measure the correlation between labels. PILL estimates the missing labels for a protein based on ComSim and the known labels of the protein, and uses a regularization to exploit the interactions between proteins for function prediction. PILL is shown to outperform other related techniques in replenishing the missing labels and in predicting the functions of completely unlabeled proteins on publicly available PPI datasets annotated with MIPS Functional Catalogue and Gene Ontology labels. Conclusion: The empirical study shows that it is important to consider the incomplete annotation for protein function prediction. The proposed method (PILL) can serve as a valuable tool for protein function prediction using incomplete labels. The Matlab code of PILL is available upon request.
    Categories: Journal Articles
  • Hierarchical domain-motion analysis of conformational changes in sarcoplasmic reticulum Ca2+-ATPase
    [Jan 2015]

    Abstract

    Sarco(endo)plasmic reticulum Ca2+-ATPase transports two Ca2+ per ATP-hydrolyzed across biological membranes against a large concentration gradient by undergoing large conformational changes. Structural studies with X-ray crystallography revealed functional roles of coupled motions between the cytoplasmic domains and the transmembrane helices in individual reaction steps. Here, we employed ‘Motion Tree (MT)', a tree diagram that describes a conformational change between two structures, and applied it to representative Ca2+-ATPase structures. MT provides information of coupled rigid-body motions of the ATPase in individual reaction steps. Fourteen rigid structural units, ‘common rigid domains (CRDs)' are identified from seven MTs throughout the whole enzymatic reaction cycle. CRDs likely act as not only the structural units, but also the functional units. Some of the functional importance has been newly revealed by the analysis. Stability of each CRD is examined on the morphing trajectories that cover seven conformational transitions. We confirmed that the large conformational changes are realized by the motions only in the flexible regions that connect CRDs. The Ca2+-ATPase efficiently utilizes its intrinsic flexibility and rigidity to response different switches like ligand binding/dissociation or ATP hydrolysis. The analysis detects functional motions without extensive biological knowledge of experts, suggesting its general applicability to domain movements in other membrane proteins to deepen the understanding of protein structure and function. This article is protected by copyright. All rights reserved.

    Categories: Journal Articles
  • Reduced alphabet for protein folding prediction
    [Jan 2015]

    Abstract

    What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28-letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28-letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least ten folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design. This article is protected by copyright. All rights reserved.

    Categories: Journal Articles
  • [Editorial] Toward designing safer chemicals
    [Jan 2015]

    One year ago, an industrial coal-processing liquid contaminated the Elk River in West Virginia and affected the tap water of 15% of the state's population. The spill was declared a federal disaster, and ongoing investigations remain. Last month, a report assessing the water and health impacts of the Elk River spill pointed to the lack of a sound scientific approach for responding to and recovering from such incidents.* This year also marks 5 years since the Deepwater Horizon oil spill in the Gulf of Mexico, and last month brought the 30-year anniversary of the Bhopal gas tragedy that killed thousands, considered the world's worst industrial disaster. Despite our best efforts and intentions, human-made chemicals continue to be released into the environment, often with unquantified and potentially unquantifiable deleterious consequences. The questions posed to science are how to better understand the nature of synthetic substances in order to predict their potential adverse impacts on humans and the biosphere, and how do we design future substances to eliminate the need for engineered control systems. Authors: Julie B. Zimmerman, Paul T. Anastas
    Categories: Journal Articles
  • [In Brief] This week's section
    [Jan 2015]

    In science news around the world, the U.S. Fish and Wildlife Service expands the range of the rare Mexican gray wolf, genetic analysis company 23andMe announces new deals to share its data with drug developers, two leading cancer groups call for e-cigarette regulation, the Federation of American Societies for Experimental Biology suggests ways for the National Institutes of Health to stretch its flat budget, and a Chinese developer's scheme to use HIV-infected people to harass residents resisting housing demolition reveals lingering anti-HIV prejudice. Also, Brazil's new science minister is a reputed climate skeptic, causing some scientists to worry about the country's environmental future. And Science talks with Microsoft Research Managing Director Eric Horvitz about his One Hundred Year Study on Artificial Intelligence, launched late last year.
    Categories: Journal Articles
  • [In Depth] High hopes for Guinean vaccine trial
    [Jan 2015]

    The push to test Ebola vaccines in the field is accelerating. Two candidates may go into phase III trials in a matter of weeks; a third one has just entered a phase I trial. Researchers have designed very different phase III studies for Liberia, Sierra Leone, and Guinea, the three countries with ongoing virus transmission. One problem they're facing is that the number of new cases has dropped sharply in Liberia and is beginning to ebb in Sierra Leone. That's why many scientists say Guinea—where researchers plan to try a highly unusual ring vaccination design—is the most promising testing ground. Author: Martin Enserink
    Categories: Journal Articles
  • Syndicate content