Journal Articles

UN approves global to-do list for next 15 years

Nature - Mon, 09/21/2015 - 23:00

UN approves global to-do list for next 15 years

Nature 525, 7570 (2015). http://www.nature.com/doifinder/10.1038/525434a

Author: Jeff Tollefson

Sustainable development goals aim to wipe out poverty without wrecking the environment.

Categories: Journal Articles

Indian ASTROSAT telescope set for global stardom

Nature - Mon, 09/21/2015 - 23:00

Indian ASTROSAT telescope set for global stardom

Nature 525, 7570 (2015). http://www.nature.com/doifinder/10.1038/525438a

Author: T. V. Padma

Observatory will extend the capabilities of existing US and European facilities, and boost Indian research.

Categories: Journal Articles

Researchers wrestle with a privacy problem

Nature - Mon, 09/21/2015 - 23:00

Researchers wrestle with a privacy problem

Nature 525, 7570 (2015). http://www.nature.com/doifinder/10.1038/525440a

Author: Erika Check Hayden

The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people's identities.

Categories: Journal Articles

Climate policy: Democracy is not an inconvenience

Nature - Mon, 09/21/2015 - 23:00

Climate policy: Democracy is not an inconvenience

Nature 525, 7570 (2015). doi:10.1038/525449a

Author: Nico Stehr

Climate scientists are tiring of governance that does not lead to action. But democracy must not be weakened in the fight against global warming, warns Nico Stehr.

Categories: Journal Articles

Corrections

Nature - Mon, 09/21/2015 - 23:00

Corrections

Nature 525, 7570 (2015). http://www.nature.com/doifinder/10.1038/525439a

CorrectionsThe Editorial ‘Too close for comfort?’ (Nature525, 289; 2015) incorrectly stated: “In his defence, Folta argued that the money supported only travel and outreach, not research, and he was therefore under no obligation to disclose it”. Folta did not say

Categories: Journal Articles

Multiscale Mechanical Model of the Pacinian Corpuscle Shows Depth and Anisotropy Contribute to the Receptor’s Characteristic Response to Indentation

PLoS Computational Biology - Mon, 09/21/2015 - 16:00

by Julia C. Quindlen, Victor K. Lai, Victor H. Barocas

Cutaneous mechanoreceptors transduce different tactile stimuli into neural signals that produce distinct sensations of touch. The Pacinian corpuscle (PC), a cutaneous mechanoreceptor located deep within the dermis of the skin, detects high frequency vibrations that occur within its large receptive field. The PC is comprised of lamellae that surround the nerve fiber at its core. We hypothesized that a layered, anisotropic structure, embedded deep within the skin, would produce the nonlinear strain transmission and low spatial sensitivity characteristic of the PC. A multiscale finite-element model was used to model the equilibrium response of the PC to indentation. The first simulation considered an isolated PC with fiber networks aligned with the PC’s surface. The PC was subjected to a 10 μm indentation by a 250 μm diameter indenter. The multiscale model captured the nonlinear strain transmission through the PC, predicting decreased compressive strain with proximity to the receptor’s core, as seen experimentally by others. The second set of simulations considered a single PC embedded epidermally (shallow) or dermally (deep) to model the PC’s location within the skin. The embedded models were subjected to 10 μm indentations at a series of locations on the surface of the skin. Strain along the long axis of the PC was calculated after indentation to simulate stretch along the nerve fiber at the center of the PC. Receptive fields for the epidermis and dermis models were constructed by mapping the long-axis strain after indentation at each point on the surface of the skin mesh. The dermis model resulted in a larger receptive field, as the calculated strain showed less indenter location dependence than in the epidermis model.
Categories: Journal Articles

GC3-biased gene domains in mammalian genomes

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Synonymous codon usage bias has been shown to be correlated with many genomic features among different organisms. However, the biological significance of codon bias with respect to gene function and genome organization remains unclear.

Results: Guanine and cytosine content at the third codon position (GC3) could be used as a good indicator of codon bias. Here, we used relative GC3 bias values to compare the strength of GC3 bias of genes in human and mouse. We reported, for the first time, that GC3-rich and GC3-poor gene products might have distinct sub-cellular spatial distributions. Moreover, we extended the view of genomic gene domains and identified conserved GC3 biased gene domains along chromosomes. Our results indicated that similar GC3 biased genes might be co-translated in specific spatial regions to share local translational machineries, and that GC3 could be involved in the organization of genome architecture.

Availability and implementation: Source code is available upon request from the authors.

Contact: zhaozh@nic.bmi.ac.cn or zany1983@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

FourCSeq: analysis of 4C sequencing data

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Circularized Chromosome Conformation Capture (4C) is a powerful technique for studying the spatial interactions of a specific genomic region called the ‘viewpoint’ with the rest of the genome, both in a single condition or comparing different experimental conditions or cell types. Observed ligation frequencies typically show a strong, regular dependence on genomic distance from the viewpoint, on top of which specific interaction peaks are superimposed. Here, we address the computational task to find these specific peaks and to detect changes between different biological conditions.

Results: We model the overall trend of decreasing interaction frequency with genomic distance by fitting a smooth monotonically decreasing function to suitably transformed count data. Based on the fit, z-scores are calculated from the residuals, and high z-scores are interpreted as peaks providing evidence for specific interactions. To compare different conditions, we normalize fragment counts between samples, and call for differential contact frequencies using the statistical method DESeq2 adapted from RNA-Seq analysis.

Availability and implementation: A full end-to-end analysis pipeline is implemented in the R package FourCSeq available at www.bioconductor.org.

Contact: felix.klein@embl.de or whuber@embl.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Mango: a bias-correcting ChIA-PET analysis pipeline

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) is an established method for detecting genome-wide looping interactions at high resolution. Current ChIA-PET analysis software packages either fail to correct for non-specific interactions due to genomic proximity or only address a fraction of the steps required for data processing. We present Mango, a complete ChIA-PET data analysis pipeline that provides statistical confidence estimates for interactions and corrects for major sources of bias including differential peak enrichment and genomic proximity.

Results: Comparison to the existing software packages, ChIA-PET Tool and ChiaSig revealed that Mango interactions exhibit much better agreement with high-resolution Hi-C data. Importantly, Mango executes all steps required for processing ChIA-PET datasets, whereas ChiaSig only completes 20% of the required steps. Application of Mango to multiple available ChIA-PET datasets permitted the independent rediscovery of known trends in chromatin loops including enrichment of CTCF, RAD21, SMC3 and ZNF143 at the anchor regions of interactions and strong bias for convergent CTCF motifs.

Availability and implementation: Mango is open source and distributed through github at https://github.com/dphansti/mango.

Contact: mpsnyder@standford.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts.

Results: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources.

Availability and implementation: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix.

Contact: dlee4@vcu.edu

Supplementary information: Supplementary Data are available at Bioinformatics online.

Categories: Journal Articles

SoloDel: a probabilistic model for detecting low-frequent somatic deletions from unmatched sequencing data

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Finding somatic mutations from massively parallel sequencing data is becoming a standard process in genome-based biomedical studies. There are a number of robust methods developed for detecting somatic single nucleotide variations However, detection of somatic copy number alteration has been substantially less explored and remains vulnerable to frequently raised sampling issues: low frequency in cell population and absence of the matched control samples.

Results: We developed a novel computational method SoloDel that accurately classifies low-frequent somatic deletions from germline ones with or without matched control samples. We first constructed a probabilistic, somatic mutation progression model that describes the occurrence and propagation of the event in the cellular lineage of the sample. We then built a Gaussian mixture model to represent the mixed population of somatic and germline deletions. Parameters of the mixture model could be estimated using the expectation-maximization algorithm with the observed distribution of read-depth ratios at the points of discordant-read based initial deletion calls. Combined with conventional structural variation caller, SoloDel greatly increased the accuracy in classifying somatic mutations. Even without control, SoloDel maintained a comparable performance in a wide range of mutated subpopulation size (10–70%). SoloDel could also successfully recall experimentally validated somatic deletions from previously reported neuropsychiatric whole-genome sequencing data.

Availability and implementation: Java-based implementation of the method is available at http://sourceforge.net/projects/solodel/

Contact: swkim@yuhs.ac or dhlee@biosoft.kaist.ac.kr

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

antaRNA: ant colony-based RNA sequence design

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: RNA sequence design is studied at least as long as the classical folding problem. Although for the latter the functional fold of an RNA molecule is to be found, inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology, reliable RNA sequence design becomes a crucial step to generate novel biochemical components.

Results: In this article, the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GC-content distribution, specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets.

Availability and implementation: http://www.bioinf.uni-freiburg.de/Software/antaRNA

Contact: backofen@informatik.uni-freiburg.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

QVZ: lossy compression of quality values

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Recent advancements in sequencing technology have led to a drastic reduction in the cost of sequencing a genome. This has generated an unprecedented amount of genomic data that must be stored, processed and transmitted. To facilitate this effort, we propose a new lossy compressor for the quality values presented in genomic data files (e.g. FASTQ and SAM files), which comprise roughly half of the storage space (in the uncompressed domain). Lossy compression allows for compression of data beyond its lossless limit.

Results: The proposed algorithm QVZ exhibits better rate-distortion performance than the previously proposed algorithms, for several distortion metrics and for the lossless case. Moreover, it allows the user to define any quasi-convex distortion function to be minimized, a feature not supported by the previous algorithms. Finally, we show that QVZ-compressed data exhibit better performance in the genotyping than data compressed with previously proposed algorithms, in the sense that for a similar rate, a genotyping closer to that achieved with the original quality values is obtained.

Availability and implementation: QVZ is written in C and can be downloaded from https://github.com/mikelhernaez/qvz.

Contact: mhernaez@stanford.edu or gmalysa@stanford.edu or iochoa@stanford.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

A parallel and sensitive software tool for methylation analysis on multicore platforms

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed.

Results: We present a new software tool, called HPG-Methyl, which efficiently maps bisulphite sequencing reads on DNA, analyzing DNA methylation. The strategy used by this software consists of leveraging the speed of the Burrows–Wheeler Transform to map a large number of DNA fragments (reads) rapidly, as well as the accuracy of the Smith–Waterman algorithm, which is exclusively employed to deal with the most ambiguous and shortest reads. Experimental results on platforms with Intel multicore processors show that HPG-Methyl significantly outperforms in both execution time and sensitivity state-of-the-art software such as Bismark, BS-Seeker or BSMAP, particularly for long bisulphite reads.

Availability and implementation: Software in the form of C libraries and functions, together with instructions to compile and execute this software. Available by sftp to anonymous@clariano.uv.es (password ‘anonymous’).

Contact: juan.orduna@uv.es or jdopazo@cipf.es

Categories: Journal Articles

UniAlign: protein structure alignment meets evolution

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: During the evolution, functional sites on the surface of the protein as well as the hydrophobic core maintaining the structural integrity are well-conserved. However, available protein structure alignment methods align protein structures based solely on the 3D geometric similarity, limiting their ability to detect functionally relevant correspondences between the residues of the proteins, especially for distantly related homologous proteins.

Results: In this article, we propose a new protein pairwise structure alignment algorithm (UniAlign) that incorporates additional evolutionary information captured in the form of sequence similarity, sequence profiles and residue conservation. We define a per-residue score (UniScore) as a weighted sum of these and other features and develop an iterative optimization procedure to search for an alignment with the best overall UniScore. Our extensive experiments on CDD, HOMSTRAD and BAliBASE benchmark datasets show that UniAlign outperforms commonly used structure alignment methods. We further demonstrate UniAlign's ability to develop family-specific models to drastically improve the quality of the alignments.

Availability and implementation: UniAlign is available as a web service at: http://sacan.biomed.drexel.edu/unialign

Contact: ahmet.sacan@drexel.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

xHeinz: an algorithm for mining cross-species network modules under a flexible conservation model

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Integrative network analysis methods provide robust interpretations of differential high-throughput molecular profile measurements. They are often used in a biomedical context—to generate novel hypotheses about the underlying cellular processes or to derive biomarkers for classification and subtyping. The underlying molecular profiles are frequently measured and validated on animal or cellular models. Therefore the results are not immediately transferable to human. In particular, this is also the case in a study of the recently discovered interleukin-17 producing helper T cells (Th17), which are fundamental for anti-microbial immunity but also known to contribute to autoimmune diseases.

Results: We propose a mathematical model for finding active subnetwork modules that are conserved between two species. These are sets of genes, one for each species, which (i) induce a connected subnetwork in a species-specific interaction network, (ii) show overall differential behavior and (iii) contain a large number of orthologous genes. We propose a flexible notion of conservation, which turns out to be crucial for the quality of the resulting modules in terms of biological interpretability. We propose an algorithm that finds provably optimal or near-optimal conserved active modules in our model. We apply our algorithm to understand the mechanisms underlying Th17 T cell differentiation in both mouse and human. As a main biological result, we find that the key regulation of Th17 differentiation is conserved between human and mouse.

Availability and implementation: xHeinz, an implementation of our algorithm, as well as all input data and results, are available at http://software.cwi.nl/xheinz and as a Galaxy service at http://services.cbib.u-bordeaux2.fr/galaxy in CBiB Tools.

Contact: gunnar.klau@cwi.nl

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Proteomic mass spectrometry analysis is becoming routine in clinical diagnostics, for example to monitor cancer biomarkers using blood samples. However, differential proteomics and identification of peaks relevant for class separation remains challenging.

Results: Here, we introduce a simple yet effective approach for identifying differentially expressed proteins using binary discriminant analysis. This approach works by data-adaptive thresholding of protein expression values and subsequent ranking of the dichotomized features using a relative entropy measure. Our framework may be viewed as a generalization of the ‘peak probability contrast’ approach of Tibshirani et al. (2004) and can be applied both in the two-group and the multi-group setting. Our approach is computationally inexpensive and shows in the analysis of a large-scale drug discovery test dataset equivalent prediction accuracy as a random forest. Furthermore, we were able to identify in the analysis of mass spectrometry data from a pancreas cancer study biological relevant and statistically predictive marker peaks unrecognized in the original study.

Availability and implementation: The methodology for binary discriminant analysis is implemented in the R package binda, which is freely available under the GNU General Public License (version 3 or later) from CRAN at URL http://cran.r-project.org/web/packages/binda/. R scripts reproducing all described analyzes are available from the web page http://strimmerlab.org/software/binda/.

Contact: k.strimmer@imperial.ac.uk

Categories: Journal Articles

Sparse multi-view matrix factorization: a multivariate approach to multiple tissue comparisons

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Within any given tissue, gene expression levels can vary extensively among individuals. Such heterogeneity can be caused by genetic and epigenetic variability and may contribute to disease. The abundance of experimental data now enables the identification of features of gene expression profiles that are shared across tissues and those that are tissue-specific. While most current research is concerned with characterizing differential expression by comparing mean expression profiles across tissues, it is believed that a significant difference in a gene expression’s variance across tissues may also be associated with molecular mechanisms that are important for tissue development and function.

Results: We propose a sparse multi-view matrix factorization (sMVMF) algorithm to jointly analyse gene expression measurements in multiple tissues, where each tissue provides a different ‘view’ of the underlying organism. The proposed methodology can be interpreted as an extension of principal component analysis in that it provides the means to decompose the total sample variance in each tissue into the sum of two components: one capturing the variance that is shared across tissues and one isolating the tissue-specific variances. sMVMF has been used to jointly model mRNA expression profiles in three tissues obtained from a large and well-phenotyped twins cohort, TwinsUK. Using sMVMF, we are able to prioritize genes based on whether their variation patterns are specific to each tissue. Furthermore, using DNA methylation profiles available, we provide supporting evidence that adipose-specific gene expression patterns may be driven by epigenetic effects.

Availability and implementation: Python code is available at http://wwwf.imperial.ac.uk/~gmontana/.

Contact: giovanni.montana@kcl.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

CCLasso: correlation inference for compositional data through Lasso

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data.

Results: In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with 1 penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project.

Availability and implementation: CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3.

Contact: dengmh@pku.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

The pervasiveness and plasticity of circadian oscillations: the coupled circadian-oscillators framework

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Circadian oscillations have been observed in animals, plants, fungi and cyanobacteria and play a fundamental role in coordinating the homeostasis and behavior of biological systems. Genetically encoded molecular clocks found in nearly every cell, based on negative transcription/translation feedback loops and involving only a dozen genes, play a central role in maintaining these oscillations. However, high-throughput gene expression experiments reveal that in a typical tissue, a much larger fraction (~10%) of all transcripts oscillate with the day–night cycle and the oscillating species vary with tissue type suggesting that perhaps a much larger fraction of all transcripts, and perhaps also other molecular species, may bear the potential for circadian oscillations.

Results: To better quantify the pervasiveness and plasticity of circadian oscillations, we conduct the first large-scale analysis aggregating the results of 18 circadian transcriptomic studies and 10 circadian metabolomic studies conducted in mice using different tissues and under different conditions. We find that over half of protein coding genes in the cell can produce transcripts that are circadian in at least one set of conditions and similarly for measured metabolites. Genetic or environmental perturbations can disrupt existing oscillations by changing their amplitudes and phases, suppressing them or giving rise to novel circadian oscillations. The oscillating species and their oscillations provide a characteristic signature of the physiological state of the corresponding cell/tissue. Molecular networks comprise many oscillator loops that have been sculpted by evolution over two trillion day–night cycles to have intrinsic circadian frequency. These oscillating loops are coupled by shared nodes in a large network of coupled circadian oscillators where the clock genes form a major hub. Cells can program and re-program their circadian repertoire through epigenetic and other mechanisms.

Availability and implementation: High-resolution and tissue/condition specific circadian data and networks available at http://circadiomics.igb.uci.edu.

Contact: pfbaldi@ics.uci.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles
Syndicate content