Journal Articles

BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected.

Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays.

Availability and implementation: BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller

Contact: Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

LoopIng: a template-based tool for predicting the structure of protein loops

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Motivation: Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function.

Results: We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4–10 residues) and significant enhancements for long loops (11–20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1 min/loop).

Availability and implementation: www.biocomputing.it/looping

Contact: anna.tramontano@uniroma1.it

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. >3 bonds, is too low to effectively assist structure assembly simulations.

Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins.

Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/

Contact: zhng@umich.edu or hbshen@sjtu.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Improving protein fold recognition with hybrid profiles combining sequence and structure evolution

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Motivation: Template-based modeling, the most successful approach for predicting protein 3D structure, often requires detecting distant evolutionary relationships between the target sequence and proteins of known structure. Developed for this purpose, fold recognition methods use elaborate strategies to exploit evolutionary information, mainly by encoding amino acid sequence into profiles. Since protein structure is more conserved than sequence, the inclusion of structural information can improve the detection of remote homology.

Results: Here, we present ORION, a new fold recognition method based on the pairwise comparison of hybrid profiles that contain evolutionary information from both protein sequence and structure. Our method uses the 16-state structural alphabet Protein Blocks, which provides an accurate 1D description of protein structure local conformations. ORION systematically outperforms PSI-BLAST and HHsearch on several benchmarks, including target sequences from the modeling competitions CASP8, 9 and 10, and detects ~10% more templates at fold and superfamily SCOP levels.

Availability: Software freely available for download at http://www.dsimb.inserm.fr/orion/.

Contact: jean-christophe.gelly@univ-paris-diderot.fr

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

PBAP: a pipeline for file processing and quality control of pedigree data with dense genetic markers

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Motivation: Huge genetic datasets with dense marker panels are now common. With the availability of sequence data and recognition of importance of rare variants, smaller studies based on pedigrees are again also common. Pedigree-based samples often start with a dense marker panel, a subset of which may be used for linkage analysis to reduce computational burden and to limit linkage disequilibrium between single-nucleotide polymorphisms (SNPs). Programs attempting to select markers for linkage panels exist but lack flexibility.

Results: We developed a pedigree-based analysis pipeline (PBAP) suite of programs geared towards SNPs and sequence data. PBAP performs quality control, marker selection and file preparation. PBAP sets up files for MORGAN, which can handle analyses for small and large pedigrees, typically human, and results can be used with other programs and for downstream analyses. We evaluate and illustrate its features with two real datasets.

Availability and implementation: PBAP scripts may be downloaded from http://faculty.washington.edu/wijsman/software.shtml.

Contact: wijsman@uw.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Identifying kinase dependency in cancer cells by integrating high-throughput drug screening and kinase inhibition data

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Motivation: Targeted kinase inhibitors have dramatically improved cancer treatment, but kinase dependency for an individual patient or cancer cell can be challenging to predict. Kinase dependency does not always correspond with gene expression and mutation status. High-throughput drug screens are powerful tools for determining kinase dependency, but drug polypharmacology can make results difficult to interpret.

Results: We developed Kinase Addiction Ranker (KAR), an algorithm that integrates high-throughput drug screening data, comprehensive kinase inhibition data and gene expression profiles to identify kinase dependency in cancer cells. We applied KAR to predict kinase dependency of 21 lung cancer cell lines and 151 leukemia patient samples using published datasets. We experimentally validated KAR predictions of FGFR and MTOR dependence in lung cancer cell line H1581, showing synergistic reduction in proliferation after combining ponatinib and AZD8055.

Availability and implementation: KAR can be downloaded as a Python function or a MATLAB script along with example inputs and outputs at: http://tanlab.ucdenver.edu/KAR/.

Contact: aikchoon.tan@ucdenver.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

A statistical approach to virtual cellular experiments: improved causal discovery using accumulation IDA (aIDA)

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Motivation: We address the following question: Does inhibition of the expression of a gene X in a cellular assay affect the expression of another gene Y? Rather than inhibiting gene X experimentally, we aim at answering this question computationally using as the only input observational gene expression data. Recently, a new statistical algorithm called Intervention calculus when the Directed acyclic graph is Absent (IDA), has been proposed for this problem. For several biological systems, IDA has been shown to outcompete regression-based methods with respect to the number of true positives versus the number of false positives for the top 5000 predicted effects. Further improvements in the performance of IDA have been realized by stability selection, a resampling method wrapped around IDA that enhances the discovery of true causal effects. Nevertheless, the rate of false positive and false negative predictions is still unsatisfactorily high.

Results: We introduce a new resampling approach for causal discovery called accumulation IDA (aIDA). We show that aIDA improves the performance of causal discoveries compared to existing variants of IDA on both simulated and real yeast data. The higher reliability of top causal effect predictions achieved by aIDA promises to increase the rate of success of wet lab intervention experiments for functional studies.

Availability and implementation: R code for aIDA is available in the Supplementary material.

Contact: franziska.taruttis@ur.de, julia.engelmann@ur.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Impact of normalization methods on high-throughput screening data with high hit rates and drug testing with dose-response data

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Motivation: Most data analysis tools for high-throughput screening (HTS) seek to uncover interesting hits for further analysis. They typically assume a low hit rate per plate. Hit rates can be dramatically higher in secondary screening, RNAi screening and in drug sensitivity testing using biologically active drugs. In particular, drug sensitivity testing on primary cells is often based on dose–response experiments, which pose a more stringent requirement for data quality and for intra- and inter-plate variation. Here, we compared common plate normalization and noise-reduction methods, including the B-score and the Loess a local polynomial fit method under high hit-rate scenarios of drug sensitivity testing. We generated simulated 384-well plate HTS datasets, each with 71 plates having a range of 20 (5%) to 160 (42%) hits per plate, with controls placed either at the edge of the plates or in a scattered configuration.

Results: We identified 20% (77/384) as the critical hit-rate after which the normalizations started to perform poorly. Results from real drug testing experiments supported this estimation. In particular, the B-score resulted in incorrect normalization of high hit-rate plates, leading to poor data quality, which could be attributed to its dependency on the median polish algorithm. We conclude that a combination of a scattered layout of controls per plate and normalization using a polynomial least squares fit method, such as Loess helps to reduce column, row and edge effects in HTS experiments with high hit-rates and is optimal for generating accurate dose–response curves.

Contact: john.mpindi@helsinki.fi

Availability and implementation, Supplementary information: R code and Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Motivation: Exome sequencing has become a de facto standard method for Mendelian disease gene discovery in recent years, yet identifying disease-causing mutations among thousands of candidate variants remains a non-trivial task.

Results: Here we describe a new variant prioritization tool, OVA (ontology variant analysis), in which user-provided phenotypic information is exploited to infer deeper biological context. OVA combines a knowledge-based approach with a variant-filtering framework. It reduces the number of candidate variants by considering genotype and predicted effect on protein sequence, and scores the remainder on biological relevance to the query phenotype.

We take advantage of several ontologies in order to bridge knowledge across multiple biomedical domains and facilitate computational analysis of annotations pertaining to genes, diseases, phenotypes, tissues and pathways. In this way, OVA combines information regarding molecular and physical phenotypes and integrates both human and model organism data to effectively prioritize variants. By assessing performance on both known and novel disease mutations, we show that OVA performs biologically meaningful candidate variant prioritization and can be more accurate than another recently published candidate variant prioritization tool.

Availability and implementation: OVA is freely accessible at http://dna2.leeds.ac.uk:8080/OVA/index.jsp

Supplementary information: Supplementary data are available at Bioinformatics online.

Contact: umaan@leeds.ac.uk

Categories: Journal Articles

cgmisc: enhanced genome-wide association analyses and visualization

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Summary: High-throughput genotyping and sequencing technologies facilitate studies of complex genetic traits and provide new research opportunities. The increasing popularity of genome-wide association studies (GWAS) leads to the discovery of new associated loci and a better understanding of the genetic architecture underlying not only diseases, but also other monogenic and complex phenotypes. Several softwares are available for performing GWAS analyses, R environment being one of them.

Results: We present cgmisc, an R package that enables enhanced data analysis and visualization of results from GWAS. The package contains several utilities and modules that complement and enhance the functionality of the existing software. It also provides several tools for advanced visualization of genomic data and utilizes the power of the R language to aid in preparation of publication-quality figures. Some of the package functions are specific for the domestic dog (Canis familiaris) data.

Availability and implementation: The package is operating system-independent and is available from: https://github.com/cgmisc-team/cgmisc

Contact: marcin.kierczak@imbim.uu.se

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

MICC: an R package for identifying chromatin interactions from ChIA-PET data

Bioinformatics Journal - Fri, 11/20/2015 - 02:05

Summary: ChIA-PET is rapidly emerging as an important experimental approach to detect chromatin long-range interactions at high resolution. Here, we present Model based Interaction Calling from ChIA-PET data (MICC), an easy-to-use R package to detect chromatin interactions from ChIA-PET sequencing data. By applying a Bayesian mixture model to systematically remove random ligation and random collision noise, MICC could identify chromatin interactions with a significantly higher sensitivity than existing methods at the same false discovery rate.

Availability and implementation: http://bioinfo.au.tsinghua.edu.cn/member/xwwang/MICCusage

Contact: michael.zhang@utdallas.edu or xwwang@tsinghua.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Two Types of Water at the Water–Surfactant Interface Revealed by Time-Resolved Vibrational Spectroscopy

Journal of American Chemical Society - Fri, 11/20/2015 - 00:59

Journal of the American Chemical SocietyDOI: 10.1021/jacs.5b07845
Categories: Journal Articles

Text-mining block prompts online response

Nature - Fri, 11/20/2015 - 00:00

Text-mining block prompts online response

Nature 527, 7579 (2015). doi:10.1038/527413f

Author: Mollie Bloudoff-Indelicato

A statistician says a major scientific publisher is hindering his research.

Categories: Journal Articles

Green Climate Fund faces slew of criticism

Nature - Fri, 11/20/2015 - 00:00

Green Climate Fund faces slew of criticism

Nature 527, 7579 (2015). http://www.nature.com/doifinder/10.1038/nature.2015.18815

Author: Sanjay Kumar

First tranche of aid projects prompts concern over operations of fund for developing nations.

Categories: Journal Articles

Leap-second decision delayed by eight years

Nature - Fri, 11/20/2015 - 00:00

Leap-second decision delayed by eight years

Nature 527, 7579 (2015). http://www.nature.com/doifinder/10.1038/nature.2015.18855

Author: Elizabeth Gibney

Some want to scrap adjustment that keeps atomic time in sync with Earth's rotation.

Categories: Journal Articles

Line-Distortion, Bandwidth and Path-Length of a Graph

Algorithmica - Fri, 11/20/2015 - 00:00
Abstract

For a graph \(G=(V,E)\) the minimum line-distortion problem asks for the minimum k such that there is a mapping f of the vertices into points of the line such that for each pair of vertices xy the distance on the line \(|f(x) - f(y)|\) can be bounded by the term \(d_G(x, y)\le |f(x)-f(y)|\le k \, d_G(x, y)\) , where \(d_G(x, y)\) is the distance in the graph. The minimum bandwidth problem minimizes the term \(\max _{uv\in E}|f(u)-f(v)|\) , where f is a mapping of the vertices of G into the integers \(\{1, \ldots , n\}\) . We investigate the minimum line-distortion and the minimum bandwidth problems on unweighted graphs and their relations with the minimum length of a Robertson–Seymour’s path-decomposition. The length of a path-decomposition of a graph is the largest diameter of a bag in the decomposition. The path-length of a graph is the minimum length over all its path-decompositions. In particular, we show:

  • there is a simple polynomial time algorithm that embeds an arbitrary unweighted input graph G into the line with distortion \(\mathcal{O}(k^2)\) , where k is the minimum line-distortion of G;

  • if a graph G can be embedded into the line with distortion k, then G admits a Robertson–Seymour’s path-decomposition with bags of diameter at most k in G;

  • for every class of graphs with path-length bounded by a constant, there exist an efficient constant-factor approximation algorithm for the minimum line-distortion problem and an efficient constant-factor approximation algorithm for the minimum bandwidth problem;

  • there is an efficient 2-approximation algorithm for computing the path-length of an arbitrary graph;

  • AT-free graphs and some intersection families of graphs have path-length at most 2;

  • for AT-free graphs, there exist a linear time 8-approximation algorithm for the minimum line-distortion problem and a linear time 4-approximation algorithm for the minimum bandwidth problem.

  • Categories: Journal Articles

    Regulation of Early Steps of GPVI Signal Transduction by Phosphatases: A Systems Biology Approach

    PLoS Computational Biology - Thu, 11/19/2015 - 17:00

    by Joanne L. Dunster, Francoise Mazet, Michael J. Fry, Jonathan M. Gibbins, Marcus J. Tindall

    We present a data-driven mathematical model of a key initiating step in platelet activation, a central process in the prevention of bleeding following Injury. In vascular disease, this process is activated inappropriately and causes thrombosis, heart attacks and stroke. The collagen receptor GPVI is the primary trigger for platelet activation at sites of injury. Understanding the complex molecular mechanisms initiated by this receptor is important for development of more effective antithrombotic medicines. In this work we developed a series of nonlinear ordinary differential equation models that are direct representations of biological hypotheses surrounding the initial steps in GPVI-stimulated signal transduction. At each stage model simulations were compared to our own quantitative, high-temporal experimental data that guides further experimental design, data collection and model refinement. Much is known about the linear forward reactions within platelet signalling pathways but knowledge of the roles of putative reverse reactions are poorly understood. An initial model, that includes a simple constitutively active phosphatase, was unable to explain experimental data. Model revisions, incorporating a complex pathway of interactions (and specifically the phosphatase TULA-2), provided a good description of the experimental data both based on observations of phosphorylation in samples from one donor and in those of a wider population. Our model was used to investigate the levels of proteins involved in regulating the pathway and the effect of low GPVI levels that have been associated with disease. Results indicate a clear separation in healthy and GPVI deficient states in respect of the signalling cascade dynamics associated with Syk tyrosine phosphorylation and activation. Our approach reveals the central importance of this negative feedback pathway that results in the temporal regulation of a specific class of protein tyrosine phosphatases in controlling the rate, and therefore extent, of GPVI-stimulated platelet activation.
    Categories: Journal Articles

    Learning of Chunking Sequences in Cognition and Behavior

    PLoS Computational Biology - Thu, 11/19/2015 - 17:00

    by Jordi Fonollosa, Emre Neftci, Mikhail Rabinovich

    We often learn and recall long sequences in smaller segments, such as a phone number 858 534 22 30 memorized as four segments. Behavioral experiments suggest that humans and some animals employ this strategy of breaking down cognitive or behavioral sequences into chunks in a wide variety of tasks, but the dynamical principles of how this is achieved remains unknown. Here, we study the temporal dynamics of chunking for learning cognitive sequences in a chunking representation using a dynamical model of competing modes arranged to evoke hierarchical Winnerless Competition (WLC) dynamics. Sequential memory is represented as trajectories along a chain of metastable fixed points at each level of the hierarchy, and bistable Hebbian dynamics enables the learning of such trajectories in an unsupervised fashion. Using computer simulations, we demonstrate the learning of a chunking representation of sequences and their robust recall. During learning, the dynamics associates a set of modes to each information-carrying item in the sequence and encodes their relative order. During recall, hierarchical WLC guarantees the robustness of the sequence order when the sequence is not too long. The resulting patterns of activities share several features observed in behavioral experiments, such as the pauses between boundaries of chunks, their size and their duration. Failures in learning chunking sequences provide new insights into the dynamical causes of neurological disorders such as Parkinson’s disease and Schizophrenia.
    Categories: Journal Articles

    Differences in Visual-Spatial Input May Underlie Different Compression Properties of Firing Fields for Grid Cell Modules in Medial Entorhinal Cortex

    PLoS Computational Biology - Thu, 11/19/2015 - 17:00

    by Florian Raudies, Michael E. Hasselmo

    Firing fields of grid cells in medial entorhinal cortex show compression or expansion after manipulations of the location of environmental barriers. This compression or expansion could be selective for individual grid cell modules with particular properties of spatial scaling. We present a model for differences in the response of modules to barrier location that arise from different mechanisms for the influence of visual features on the computation of location that drives grid cell firing patterns. These differences could arise from differences in the position of visual features within the visual field. When location was computed from the movement of visual features on the ground plane (optic flow) in the ventral visual field, this resulted in grid cell spatial firing that was not sensitive to barrier location in modules modeled with small spacing between grid cell firing fields. In contrast, when location was computed from static visual features on walls of barriers, i.e. in the more dorsal visual field, this resulted in grid cell spatial firing that compressed or expanded based on the barrier locations in modules modeled with large spacing between grid cell firing fields. This indicates that different grid cell modules might have differential properties for computing location based on visual cues, or the spatial radius of sensitivity to visual cues might differ between modules.
    Categories: Journal Articles
    Syndicate content