Journal Articles
SAAS-CNV: A Joint Segmentation Approach on Aggregated and Allele Specific Signals for the Identification of Somatic Copy Number Alterations with Next-Generation Sequencing Data
by Zhongyang Zhang, Ke Hao
Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity.Untangling Brain-Wide Dynamics in Consciousness by Cross-Embedding
by Satohiro Tajima, Toru Yanagawa, Naotaka Fujii, Taro Toyoizumi
Brain-wide interactions generating complex neural dynamics are considered crucial for emergent cognitive functions. However, the irreducible nature of nonlinear and high-dimensional dynamical interactions challenges conventional reductionist approaches. We introduce a model-free method, based on embedding theorems in nonlinear state-space reconstruction, that permits a simultaneous characterization of complexity in local dynamics, directed interactions between brain areas, and how the complexity is produced by the interactions. We demonstrate this method in large-scale electrophysiological recordings from awake and anesthetized monkeys. The cross-embedding method captures structured interaction underlying cortex-wide dynamics that may be missed by conventional correlation-based analysis, demonstrating a critical role of time-series analysis in characterizing brain state. The method reveals a consciousness-related hierarchy of cortical areas, where dynamical complexity increases along with cross-area information flow. These findings demonstrate the advantages of the cross-embedding method in deciphering large-scale and heterogeneous neuronal systems, suggesting a crucial contribution by sensory-frontoparietal interactions to the emergence of complex brain dynamics during consciousness.“Broadband” Bioinformatics Skills Transfer with the Knowledge Transfer Programme (KTP): Educational Model for Upliftment and Sustainable Development
by Emile R. Chimusa, Mamana Mbiyavanga, Velaphi Masilela, Judit Kumuthini
A shortage of practical skills and relevant expertise is possibly the primary obstacle to social upliftment and sustainable development in Africa. The “omics” fields, especially genomics, are increasingly dependent on the effective interpretation of large and complex sets of data. Despite abundant natural resources and population sizes comparable with many first-world countries from which talent could be drawn, countries in Africa still lag far behind the rest of the world in terms of specialized skills development. Moreover, there are serious concerns about disparities between countries within the continent. The multidisciplinary nature of the bioinformatics field, coupled with rare and depleting expertise, is a critical problem for the advancement of bioinformatics in Africa. We propose a formalized matchmaking system, which is aimed at reversing this trend, by introducing the Knowledge Transfer Programme (KTP). Instead of individual researchers travelling to other labs to learn, researchers with desirable skills are invited to join African research groups for six weeks to six months. Visiting researchers or trainers will pass on their expertise to multiple people simultaneously in their local environments, thus increasing the efficiency of knowledge transference. In return, visiting researchers have the opportunity to develop professional contacts, gain industry work experience, work with novel datasets, and strengthen and support their ongoing research. The KTP develops a network with a centralized hub through which groups and individuals are put into contact with one another and exchanges are facilitated by connecting both parties with potential funding sources. This is part of the PLOS Computational Biology Education collection.A Bio-inspired Collision Avoidance Model Based on Spatial Information Derived from Motion Detectors Leads to Common Routes
by Olivier J. N. Bertrand, Jens P. Lindemann, Martin Egelhaaf
Avoiding collisions is one of the most basic needs of any mobile agent, both biological and technical, when searching around or aiming toward a goal. We propose a model of collision avoidance inspired by behavioral experiments on insects and by properties of optic flow on a spherical eye experienced during translation, and test the interaction of this model with goal-driven behavior. Insects, such as flies and bees, actively separate the rotational and translational optic flow components via behavior, i.e. by employing a saccadic strategy of flight and gaze control. Optic flow experienced during translation, i.e. during intersaccadic phases, contains information on the depth-structure of the environment, but this information is entangled with that on self-motion. Here, we propose a simple model to extract the depth structure from translational optic flow by using local properties of a spherical eye. On this basis, a motion direction of the agent is computed that ensures collision avoidance. Flying insects are thought to measure optic flow by correlation-type elementary motion detectors. Their responses depend, in addition to velocity, on the texture and contrast of objects and, thus, do not measure the velocity of objects veridically. Therefore, we initially used geometrically determined optic flow as input to a collision avoidance algorithm to show that depth information inferred from optic flow is sufficient to account for collision avoidance under closed-loop conditions. Then, the collision avoidance algorithm was tested with bio-inspired correlation-type elementary motion detectors in its input. Even then, the algorithm led successfully to collision avoidance and, in addition, replicated the characteristics of collision avoidance behavior of insects. Finally, the collision avoidance algorithm was combined with a goal direction and tested in cluttered environments. The simulated agent then showed goal-directed behavior reminiscent of components of the navigation behavior of insects.Electrochemical Imaging and Redox Interrogation of Surface Defects on Operating SrTiO3 Photoelectrodes
Complex Surface Diffusion Mechanisms of Cobalt Phthalocyanine Molecules on Ag(100)
Electronically Stabilized Nonplanar Phenalenyl Radical and Its Planar Isomer
Substituent Effects in CH Hydrogen Bond Interactions: Linear Free Energy Relationships and Influence of Anions
Identical Location Transmission Electron Microscopy Imaging of Site-Selective Pt Nanocatalysts: Electrochemical Activation and Surface Disordering
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
Amino acid alphabet reduction preserves fold information contained in contact interactions in proteins
To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20-letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long-range (contact) interactions among amino acids in natively-folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well-defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well-known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long-range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches—including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs—fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. Proteins 2015; 83:2198–2216. © 2015 Wiley Periodicals, Inc.
Exploring interaction mechanisms of the inhibitor binding to the VP35 IID region of Ebola virus by all atom molecular dynamics simulation method
Ebola viruses (EBOVs) cause an acute and serious illness which is often fatal if untreated, and there is no effective vaccine until now. Multifunctional VP35 is critical for viral replication, RNA silencing suppression and nucleocapsid formation, and it is considered as a future target for the molecular biology technique. In the present work, the binding of inhibitor pyrrole-based compounds (GA017) to wild-type (WT), single (K248A, K251A, and I295A), and double (K248A/I295A) mutant VP35 were investigated by all-atom molecular dynamic (MD) simulations and Molecular Mechanics Generalized Born surface area (MM/GBSA) energy calculation. The calculated results indicate that the binding with GA017 makes the binding pocket more stable and reduces the space of the binding pocket. Moreover, the electrostatic interactions (ΔEele) and VDW energy (ΔEvdw) provide the major forces for affinity binding, and single mutation I295A and double mutation K248A/I295A have great influence on the conformation of the VP35 binding pocket. Interestingly, the residues R300-G301-D302 of I295A form a new helix and the sheet formed by the residues V294-I295-H296-I297 disappears in the double mutation K248A/I295A as compared with WT. Moreover, the binding free energy calculations show that I295A and K248A/I295A mutations decrease of absolute binding free energies while K248A and K251A mutations increase absolute binding free energy. Our calculated results are in good agreement with the experimental results that K248A/I295A double mutant results in near-complete loss of compound binding. The obtained information will be useful for design effective inhibitors for treating Ebola virus. Proteins 2015; 83:2263–2278. © 2015 Wiley Periodicals, Inc.
Exact Algorithms for Minimum Weighted Dominating Induced Matching
Say that an edge of a graph G dominates itself and every other edge sharing a vertex of it. An edge dominating set of a graph \(G=(V,E)\) is a subset of edges \(E' \subseteq E\) which dominates all edges of G. In particular, if every edge of G is dominated by exactly one edge of \(E'\) then \(E'\) is a dominating induced matching. It is known that not every graph admits a dominating induced matching, while the problem to decide if it does admit it is NP-complete. In this paper we consider the problems of counting the number of dominating induced matchings and finding a minimum weighted dominating induced matching, if any, of a graph with weighted edges. We describe three exact algorithms for general graphs. The first runs in linear time for a given vertex dominating set of fixed size of the graph. The second runs in polynomial time if the graph admits a polynomial number of maximal independent sets. The third one is an \(O^*(1.1939^n)\) time and polynomial (linear) space, which improves over the existing algorithms for exactly solving this problem in general graphs.
Evaluation of Monotone DNF Formulas
Stochastic boolean function evaluation (SBFE) is the problem of determining the value of a given boolean function f on an unknown input x, when each bit \(x_i\) of x can only be determined by paying a given associated cost \(c_i\) . Further, x is drawn from a given product distribution: for each \(x_i\) , \(\mathbf{Pr}[x_i=1] = p_i\) and the bits are independent. The goal is to minimize the expected cost of evaluation. In this paper, we study the complexity of the SBFE problem for classes of DNF formulas. We consider both exact and approximate versions of the problem for subclasses of DNF, for arbitrary costs and product distributions, and for unit costs and/or the uniform distribution.
An heuristic filtering tool to identify phenotype-associated genetic variants applied to human intellectual disability and canine coat colors
Distribution of single nucleotide variants on protein-protein interaction sites and its relation to minor allele frequency
Recent advances in DNA sequencing techniques have identified rare single nucleotide variants with less than 1% minor allele frequency. Despite the growing interest and physiological importance of rare variants in genome sciences, less attention has been paid to the allele frequency of variants in protein sciences. To elucidate the characteristics of genetic variants on protein interaction sites, from the viewpoints of the allele frequency and the structural position of variants, we mapped about 20,000 human SNVs onto protein complexes. We found that variants are less abundant in protein interfaces, and specifically the core regions of interfaces. The tendency to “avoid” the interfacial core is stronger among common variants than rare variants. As amino acid substitutions, the trend of mutating amino acids among rare variants is consistent in different interfacial regions, reflecting the fact that rare variants result from random mutations in DNA sequences, whereas amino acid changes of common variants vary between the interfacial core and rim regions, possibly due to functional constraints on proteins. This study illustrated how the allele frequency of variants relates to the protein structural regions and the functional sites in general, and will lead to deeper understanding of the potential deleteriousness of rare variants at the structural level. Exceptional cases of the observed trends will shed light on the limitations of structural approaches to evaluate the functional impacts of variants. This article is protected by copyright. All rights reserved.