Journal Articles

Integrating full spectrum of sequence features into predicting functional microRNA-mRNA interactions

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: MicroRNAs (miRNAs) play important roles in general biological processes and diseases pathogenesis. Identifying miRNA target genes is an essential step to fully understand the regulatory effects of miRNAs. Many computational methods based on the sequence complementary rules and the miRNA and mRNA expression profiles have been developed for this purpose. It is noted that there have been many sequence features of miRNA targets available, including the context features of the target sites, the thermodynamic stability and the accessibility energy for miRNA-mRNA interaction. However, most of current computational methods that combine sequence and expression information do not effectively integrate full spectrum of these features; instead, they perceive putative miRNA–mRNA interactions from sequence-based prediction as equally meaningful. Therefore, these sequence features have not been fully utilized for improving miRNA target prediction.

Results: We propose a novel regularized regression approach that is based on the adaptive Lasso procedure for detecting functional miRNA–mRNA interactions. Our method fully takes into account the gene sequence features and the miRNA and mRNA expression profiles. Given a set of sequence features for each putative miRNA–mRNA interaction and their expression values, our model quantifies the down-regulation effect of each miRNA on its targets while simultaneously estimating the contribution of each sequence feature to predicting functional miRNA–mRNA interactions. By applying our model to the expression datasets from two cancer studies, we have demonstrated our prediction results have achieved better sensitivity and specificity and are more biologically meaningful compared with those based on other methods.

Availability and implementation: The source code is available at: http://nba.uth.tmc.edu/homepage/liu/miRNALasso.

Supplementary information: Supplementary data are available at Bioinformatics online.

Contact: Yin.Liu@uth.tmc.edu

Categories: Journal Articles

C-It-Loci: a knowledge database for tissue-enriched loci

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Increasing evidences suggest that most of the genome is transcribed into RNAs, but many of them are not translated into proteins. All those RNAs that do not become proteins are called ‘non-coding RNAs (ncRNAs)’, which outnumbers protein-coding genes. Interestingly, these ncRNAs are shown to be more tissue specifically expressed than protein-coding genes. Given that tissue-specific expressions of transcripts suggest their importance in the expressed tissue, researchers are conducting biological experiments to elucidate the function of such ncRNAs. Owing greatly to the advancement of next-generation techniques, especially RNA-seq, the amount of high-throughput data are increasing rapidly. However, due to the complexity of the data as well as its high volume, it is not easy to re-analyze such data to extract tissue-specific expressions of ncRNAs from published datasets.

Results: Here, we introduce a new knowledge database called ‘C-It-Loci’, which allows a user to screen for tissue-specific transcripts across three organisms: human, mouse and zebrafish. C-It-Loci is intuitive and easy to use to identify not only protein-coding genes but also ncRNAs from various tissues. C-It-Loci defines homology through sequence and positional conservation to allow for the extraction of species-conserved loci. C-It-Loci can be used as a starting point for further biological experiments.

Availability and implementation: C-It-Loci is freely available online without registration at http://c-it-loci.uni-frankfurt.de.

Contact: uchida@med.uni-frankfurt.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Nsite, NsiteH and NsiteM computer tools for studying transcription regulatory elements

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Summary: Gene transcription is mostly conducted through interactions of various transcription factors and their binding sites on DNA (regulatory elements, REs). Today, we are still far from understanding the real regulatory content of promoter regions. Computer methods for identification of REs remain a widely used tool for studying and understanding transcriptional regulation mechanisms. The Nsite, NsiteH and NsiteM programs perform searches for statistically significant (non-random) motifs of known human, animal and plant one-box and composite REs in a single genomic sequence, in a pair of aligned homologous sequences and in a set of functionally related sequences, respectively.

Availability and implementation: Pre-compiled executables built under commonly used operating systems are available for download by visiting http://www.molquest.kaust.edu.sa and http://www.softberry.com.

Contact: solovictor@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

nextflu: real-time tracking of seasonal influenza virus evolution in humans

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Summary: Seasonal influenza viruses evolve rapidly, allowing them to evade immunity in their human hosts and reinfect previously infected individuals. Similarly, vaccines against seasonal influenza need to be updated frequently to protect against an evolving virus population. We have thus developed a processing pipeline and browser-based visualization that allows convenient exploration and analysis of the most recent influenza virus sequence data. This web-application displays a phylogenetic tree that can be decorated with additional information such as the viral genotype at specific sites, sampling location and derived statistics that have been shown to be predictive of future virus dynamics. In addition, mutation, genotype and clade frequency trajectories are calculated and displayed.

Availability and implementation: Python and Javascript source code is freely available from https://github.com/blab/nextflu, while the web-application is live at http://nextflu.org.

Contact: tbedford@fredhutch.org

Categories: Journal Articles

al3c: high-performance software for parameter inference using Approximate Bayesian Computation

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: The development of Approximate Bayesian Computation (ABC) algorithms for parameter inference which are both computationally efficient and scalable in parallel computing environments is an important area of research. Monte Carlo rejection sampling, a fundamental component of ABC algorithms, is trivial to distribute over multiple processors but is inherently inefficient. While development of algorithms such as ABC Sequential Monte Carlo (ABC-SMC) help address the inherent inefficiencies of rejection sampling, such approaches are not as easily scaled on multiple processors. As a result, current Bayesian inference software offerings that use ABC-SMC lack the ability to scale in parallel computing environments.

Results: We present al3c, a C++ framework for implementing ABC-SMC in parallel. By requiring only that users define essential functions such as the simulation model and prior distribution function, al3c abstracts the user from both the complexities of parallel programming and the details of the ABC-SMC algorithm. By using the al3c framework, the user is able to scale the ABC-SMC algorithm in parallel computing environments for his or her specific application, with minimal programming overhead.

Availability and implementation: al3c is offered as a static binary for Linux and OS-X computing environments. The user completes an XML configuration file and C++ plug-in template for the specific application, which are used by al3c to obtain the desired results. Users can download the static binaries, source code, reference documentation and examples (including those in this article) by visiting https://github.com/ahstram/al3c.

Contact: astram@usc.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

PSIKO2: a fast and versatile tool to infer population stratification on various levels in GWAS

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Summary: Genome-wide association studies are an invaluable tool for identifying genotypic loci linked with agriculturally important traits or certain diseases. The signal on which such studies rely upon can, however, be obscured by population stratification making it necessary to account for it in some way. Population stratification is dependent on when admixture happened and thus can occur at various levels. To aid in its inference at the genome level, we recently introduced psiko, and comparison with leading methods indicates that it has attractive properties. However, until now, it could not be used for local ancestry inference which is preferable in cases of recent admixture as the genome level tends to be too coarse to properly account for processes acting on small segments of a genome. To also bring the powerful ideas underpinning psiko to bear in such studies, we extended it to psiko2, which we introduce here.

Availability and implementation: Source code, binaries and user manual are freely available at https://www.uea.ac.uk/computing/psiko.

Contact: Andrei-Alin.Popescu@uea.ac.uk or Katharina.Huber@cmp.uea.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Summary: Assessing linkage disequilibrium (LD) across ancestral populations is a powerful approach for investigating population-specific genetic structure as well as functionally mapping regions of disease susceptibility. Here, we present LDlink, a web-based collection of bioinformatic modules that query single nucleotide polymorphisms (SNPs) in population groups of interest to generate haplotype tables and interactive plots. Modules are designed with an emphasis on ease of use, query flexibility, and interactive visualization of results. Phase 3 haplotype data from the 1000 Genomes Project are referenced for calculating pairwise metrics of LD, searching for proxies in high LD, and enumerating all observed haplotypes. LDlink is tailored for investigators interested in mapping common and uncommon disease susceptibility loci by focusing on output linking correlated alleles and highlighting putative functional variants.

Availability and implementation: LDlink is a free and publically available web tool which can be accessed at http://analysistools.nci.nih.gov/LDlink/.

Contact: mitchell.machiela@nih.gov

Categories: Journal Articles

Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Summary: Modeling of dynamical systems using ordinary differential equations is a popular approach in the field of systems biology. Two of the most critical steps in this approach are to construct dynamical models of biochemical reaction networks for large datasets and complex experimental conditions and to perform efficient and reliable parameter estimation for model fitting. We present a modeling environment for MATLAB that pioneers these challenges. The numerically expensive parts of the calculations such as the solving of the differential equations and of the associated sensitivity system are parallelized and automatically compiled into efficient C code. A variety of parameter estimation algorithms as well as frequentist and Bayesian methods for uncertainty analysis have been implemented and used on a range of applications that lead to publications.

Availability and implementation: The Data2Dynamics modeling environment is MATLAB based, open source and freely available at http://www.data2dynamics.org.

Contact: andreas.raue@fdm.uni-freiburg.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Predicting tumor purity from methylation microarray data

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: In cancer genomics research, one important problem is that the solid tissue sample obtained from clinical settings is always a mixture of cancer and normal cells. The sample mixture brings complication in data analysis and results in biased findings if not correctly accounted for. Estimating tumor purity is of great interest, and a number of methods have been developed using gene expression, copy number variation or point mutation data.

Results: We discover that in cancer samples, the distributions of data from Illumina Infinium 450 k methylation microarray are highly correlated with tumor purities. We develop a simple but effective method to estimate purities from the microarray data. Analyses of the Cancer Genome Atlas lung cancer data demonstrate favorable performance of the proposed method.

Availability and implementation: The method is implemented in InfiniumPurify, which is freely available at https://bitbucket.org/zhengxiaoqi/infiniumpurify.

Contact: xqzheng@shnu.edu.cn or hao.wu@emory.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

ISQuest: finding insertion sequences in prokaryotic sequence fragment data

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Insertion sequences (ISs) are transposable elements present in most bacterial and archaeal genomes that play an important role in genomic evolution. The increasing availability of sequenced prokaryotic genomes offers the opportunity to study ISs comprehensively, but development of efficient and accurate tools is required for discovery and annotation. Additionally, prokaryotic genomes are frequently deposited as incomplete, or draft stage because of the substantial cost and effort required to finish genome assembly projects. Development of methods to identify IS directly from raw sequence reads or draft genomes are therefore desirable. Software tools such as Optimized Annotation System for Insertion Sequences and IScan currently identify IS elements in completely assembled and annotated genomes; however, to our knowledge no methods have been developed to identify ISs from raw fragment data or partially assembled genomes. We have developed novel methods to solve this computationally challenging problem, and implemented these methods in the software package ISQuest. This software identifies bacterial ISs and their sequence elements—inverted and direct repeats—in raw read data or contigs using flexible search parameters. ISQuest is capable of finding ISs in hundreds of partially assembled genomes within hours, making it a valuable high-throughput tool for a global search of IS elements. We tested ISQuest on simulated read libraries of 3810 complete bacterial genomes and plasmids in GenBank and were capable of detecting 82% of the ISs and transposases annotated in GenBank with 80% sequence identity.

Contact: abiswas@cs.odu.edu

Categories: Journal Articles

DINGO: differential network analysis in genomics

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Cancer progression and development are initiated by aberrations in various molecular networks through coordinated changes across multiple genes and pathways. It is important to understand how these networks change under different stress conditions and/or patient-specific groups to infer differential patterns of activation and inhibition. Existing methods are limited to correlation networks that are independently estimated from separate group-specific data and without due consideration of relationships that are conserved across multiple groups.

Method: We propose a pathway-based differential network analysis in genomics (DINGO) model for estimating group-specific networks and making inference on the differential networks. DINGO jointly estimates the group-specific conditional dependencies by decomposing them into global and group-specific components. The delineation of these components allows for a more refined picture of the major driver and passenger events in the elucidation of cancer progression and development.

Results: Simulation studies demonstrate that DINGO provides more accurate group-specific conditional dependencies than achieved by using separate estimation approaches. We apply DINGO to key signaling pathways in glioblastoma to build differential networks for long-term survivors and short-term survivors in The Cancer Genome Atlas. The hub genes found by mRNA expression, DNA copy number, methylation and microRNA expression reveal several important roles in glioblastoma progression.

Availability and implementation: R Package at: odin.mdacc.tmc.edu/~vbaladan.

Contact: veera@mdanderson.org

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low.

Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.

Availability and implementation: Karect is available at: http://aminallam.github.io/karect.

Contact: amin.allam@kaust.edu.sa

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

ProFET: Feature engineering captures high-level protein functions

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: The amount of sequenced genomes and proteins is growing at an unprecedented pace. Unfortunately, manual curation and functional knowledge lag behind. Homologous inference often fails at labeling proteins with diverse functions and broad classes. Thus, identifying high-level protein functionality remains challenging. We hypothesize that a universal feature engineering approach can yield classification of high-level functions and unified properties when combined with machine learning approaches, without requiring external databases or alignment.

Results: In this study, we present a novel bioinformatics toolkit called ProFET (Protein Feature Engineering Toolkit). ProFET extracts hundreds of features covering the elementary biophysical and sequence derived attributes. Most features capture statistically informative patterns. In addition, different representations of sequences and the amino acids alphabet provide a compact, compressed set of features. The results from ProFET were incorporated in data analysis pipelines, implemented in python and adapted for multi-genome scale analysis. ProFET was applied on 17 established and novel protein benchmark datasets involving classification for a variety of binary and multi-class tasks. The results show state of the art performance. The extracted features’ show excellent biological interpretability. The success of ProFET applies to a wide range of high-level functions such as subcellular localization, structural classes and proteins with unique functional properties (e.g. neuropeptide precursors, thermophilic and nucleic acid binding). ProFET allows easy, universal discovery of new target proteins, as well as understanding the features underlying different high-level protein functions.

Availability and implementation: ProFET source code and the datasets used are freely available at https://github.com/ddofer/ProFET.

Contact: michall@cc.huji.ac.il

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Direct DNA Analysis with Paper-Based Ion Concentration Polarization

Journal of American Chemical Society - Tue, 10/20/2015 - 07:38

Journal of the American Chemical SocietyDOI: 10.1021/jacs.5b08523
Categories: Journal Articles

Production of soluble and active microbial transglutaminase in Escherichia coli for site-specific antibody drug conjugation

Protein Science - Tue, 10/20/2015 - 03:31
Abstract

Applications of microbial transglutaminase (mTGase) produced from Streptomyces mobarensis (S. mobarensis) were recently extended from food to pharmaceutical industry. To use mTGase for clinical applications, like generation of site specific antibody drug conjugates, it would be beneficial to manufacture mTGase in Escherichia coli (E. coli). To date, attempts to express recombinant soluble and active S. mobarensis mTGase have been largely unsuccessful. mTGase from S. mobarensis is naturally expressed as proenzyme and stepwise proteolytically processed into its active mature form outside of the bacterial cell. The pro-domain is essential for correct folding of mTGase as well as for inhibiting activity of mTGase inside the cell. Here we report a genetically modified mTGase that has full activity and can be expressed at high yields in the cytoplasm of E. coli. To achieve this we performed an alanine-scan of the mTGase pro-domain and identified mutants that maintain its chaperone function but destabilize the cleaved pro-domain/mTGase interaction in a temperature dependent fashion. This allows proper folding of mTGase and keeps the enzyme inactive during expression at 20°C, but results in full activity when shifted to 37°C due to loosened domain interactions. The insertion of the 3C protease cleavage site together with pro-domain alanine mutants Tyr14, Ile24 or Asn25 facilitate high yields (30-75mg/L), and produced an enzyme with activity identical to wild type mTGase from S. mobarensis. Site-specific antibody drug conjugates made with the E .coli produced mTGase demonstrated identical potency in an in vitro cell assay to those made with mTGase from S. mobarensis. This article is protected by copyright. All rights reserved.

Categories: Journal Articles

1,3-Propanediol binds inside the water-conducting pore of aquaporin 4: Does this efficacious inhibitor have sufficient potency?

Protein Science - Tue, 10/20/2015 - 03:31
Abstract

Among the thirteen types of water channel proteins, aquaporins (AQPs), which play various essential roles in human physiology, AQP4 is richly expressed in cells of the central nervous system and implicated in pathological conditions such as brain edema. Therefore, researchers have been looking for ways to inhibit AQP4's water-conducting function. Many small molecules have been investigated for their interactions with the residues that form the AQP4 channel entry vestibule on the extracellular side and their interruption of waters entering into the conducting pore. Conducting all-atom simulations on the basis of CHARMM 36 force field, we study one such inhibitor, 5-acetamido-1,3,4-thiadiazole-2-sulfonamide (AZM), to achieve quantitative agreement between the computed and the experimentally measured values of AZM-AQP4 binding affinity. Using the same method, we examine the possibility of plugging up the AQP4 channel around the Asn-Pro-Ala motifs located near the channel center because a small molecule bound there would totally occlude water conduction through AQP4. We compute the binding affinities of 1,2-ethanediol (EDO) and 1,3-propanediol (PDO) inside the AQP4 conducting pore and identify the specificities of the interactions. The EDO-AQP4 interaction is weak with a dissociation constant of 80 mM. The PDO-AQP4 interaction is rather strong with a dissociation constant of 328 µM, which indicates that PDO is an efficacious AQP4 inhibitor with sufficiently high potency. Considering the fact that PDO is classified by the US Food and Drug Administration as generally safe, we predict that 1,3-propanediol could be an effective drug for brain edema and other AQP4-correlated neurological conditions. This article is protected by copyright. All rights reserved.

Categories: Journal Articles

Crystal structure of quinone-dependent alcohol dehydrogenase from Pseudogluconobacter saccharoketogenes. A versatile dehydrogenase oxidizing alcohols and carbohydrates

Protein Science - Tue, 10/20/2015 - 02:18
Abstract

The quinone-dependent alcohol dehydrogenase (PQQ-ADH, E.C. 1.1.5.2) from the Gram-negative bacterium Pseudogluconobacter saccharoketogenes IFO 14464 oxidizes primary alcohols (e.g. ethanol, butanol), secondary alcohols (monosaccharides), as well as aldehydes, polysaccharides, and cyclodextrins. The recombinant protein, expressed in Pichia pastoris, was crystallized, and three-dimensional (3D) structures of the native form, with PQQ and a Ca2+ ion, and of the enzyme in complex with a Zn2+ ion and a bound substrate mimic were determined at 1.72 Å and 1.84 Å resolution, respectively. PQQ-ADH displays an eight-bladed β-propeller fold, characteristic of Type I quinone-dependent methanol dehydrogenases. However, three of the four ligands of the Ca2+ ion differ from those of related dehydrogenases and they come from different parts of the polypeptide chain. These differences result in a more open, easily accessible active site, which explains why PQQ-ADH can oxidize a broad range of substrates. The bound substrate mimic suggests Asp333 as the catalytic base. Remarkably, no vicinal disulfide bridge is present near the PQQ, which in other PQQ-dependent alcohol dehydrogenases has been proposed to be necessary for electron transfer. Instead an associated cytochrome c can approach the PQQ for direct electron transfer.

Categories: Journal Articles

Russian roulette

Nature - Mon, 10/19/2015 - 23:00

Russian roulette

Nature 526, 7574 (2015). doi:10.1038/526475a

Attempts to keep foreign interests out of Russian research will only suppress the exchange of information, and risk damaging East–West relations.

Categories: Journal Articles

Indigenous peoples must benefit from science

Nature - Mon, 10/19/2015 - 23:00

Indigenous peoples must benefit from science

Nature 526, 7574 (2015). http://www.nature.com/doifinder/10.1038/526477a

Author: Dyna Rochmyaningsih

To drive sustainable development, Dyna Rochmyaningsih argues, science must empower rural communities — not just serve industry and governments.

Categories: Journal Articles

Neutrino study made key priority for US nuclear physics

Nature - Mon, 10/19/2015 - 23:00

Neutrino study made key priority for US nuclear physics

Nature 526, 7574 (2015). http://www.nature.com/doifinder/10.1038/526485a

Author: Davide Castelvecchi

Wish list also includes new particle collider.

Categories: Journal Articles
Syndicate content