Nucleic Acids Research

Syndicate content
Nucleic Acids Research - RSS feed of current issue
Updated: 8 years 15 weeks ago

Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing

Mon, 11/16/2015 - 01:17

Single Molecule, Real-Time (SMRT®) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution.

Categories: Journal Articles

MRPrimer: a MapReduce-based method for the thorough design of valid and ranked primers for PCR

Mon, 11/16/2015 - 01:17

Primer design is a fundamental technique that is widely used for polymerase chain reaction (PCR). Although many methods have been proposed for primer design, they require a great deal of manual effort to generate feasible and valid primers, including homology tests on off-target sequences using BLAST-like tools. That approach is inconvenient for many target sequences of quantitative PCR (qPCR) due to considering the same stringent and allele-invariant constraints. To address this issue, we propose an entirely new method called MRPrimer that can design all feasible and valid primer pairs existing in a DNA database at once, while simultaneously checking a multitude of filtering constraints and validating primer specificity. Furthermore, MRPrimer suggests the best primer pair for each target sequence, based on a ranking method. Through qPCR analysis using 343 primer pairs and the corresponding sequencing and comparative analyses, we showed that the primer pairs designed by MRPrimer are very stable and effective for qPCR. In addition, MRPrimer is computationally efficient and scalable and therefore useful for quickly constructing an entire collection of feasible and valid primers for frequently updated databases like RefSeq. Furthermore, we suggest that MRPrimer can be utilized conveniently for experiments requiring primer design, especially real-time qPCR.

Categories: Journal Articles

A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays

Mon, 11/16/2015 - 01:17

Understanding the differences between microarray and RNA-Seq technologies for measuring gene expression is necessary for informed design of experiments and choice of data analysis methods. Previous comparisons have come to sometimes contradictory conclusions, which we suggest result from a lack of attention to the intensity-dependent nature of variation generated by the technologies. To examine this trend, we carried out a parallel nested experiment performed simultaneously on the two technologies that systematically split variation into four stages (treatment, biological variation, library preparation and chip/lane noise), allowing a separation and comparison of the sources of variation in a well-controlled cellular system, Saccharomyces cerevisiae. With this novel dataset, we demonstrate that power and accuracy are more dependent on per-gene read depth in RNA-Seq than they are on fluorescence intensity in microarrays. However, we carried out quantitative PCR validations which indicate that microarrays may demonstrate greater systematic bias in low-intensity genes than in RNA-seq.

Categories: Journal Articles

Biological chromodynamics: a general method for measuring protein occupancy across the genome by calibrating ChIP-seq

Mon, 11/16/2015 - 01:17

Sequencing DNA fragments associated with proteins following in vivo cross-linking with formaldehyde (known as ChIP-seq) has been used extensively to describe the distribution of proteins across genomes. It is not widely appreciated that this method merely estimates a protein's distribution and cannot reveal changes in occupancy between samples. To do this, we tagged with the same epitope orthologous proteins in Saccharomyces cerevisiae and Candida glabrata, whose sequences have diverged to a degree that most DNA fragments longer than 50 bp are unique to just one species. By mixing defined numbers of C. glabrata cells (the calibration genome) with S. cerevisiae samples (the experimental genomes) prior to chromatin fragmentation and immunoprecipitation, it is possible to derive a quantitative measure of occupancy (the occupancy ratio – OR) that enables a comparison of occupancies not only within but also between genomes. We demonstrate for the first time that this ‘internal standard’ calibration method satisfies the sine qua non for quantifying ChIP-seq profiles, namely linearity over a wide range. Crucially, by employing functional tagged proteins, our calibration process describes a method that distinguishes genuine association within ChIP-seq profiles from background noise. Our method is applicable to any protein, not merely highly conserved ones, and obviates the need for the time consuming, expensive, and technically demanding quantification of ChIP using qPCR, which can only be performed on individual loci. As we demonstrate for the first time in this paper, calibrated ChIP-seq represents a major step towards documenting the quantitative distributions of proteins along chromosomes in different cell states, which we term biological chromodynamics.

Categories: Journal Articles

Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments

Mon, 11/16/2015 - 01:17

The human reference assembly remains incomplete due to the underrepresentation of repeat-rich sequences that are found within centromeric regions and acrocentric short arms. Although these sequences are marginally represented in the assembly, they are often fully represented in whole-genome short-read datasets and contribute to inappropriate alignments and high read-depth signals that localize to a small number of assembled homologous regions. As a consequence, these regions often provide artifactual peak calls that confound hypothesis testing and large-scale genomic studies. To address this problem, we have constructed mapping targets that represent roughly 8% of the human genome generally omitted from the human reference assembly. By integrating these data into standard mapping and peak-calling pipelines we demonstrate a 10-fold reduction in signals in regions common to the blacklisted region and identify a comprehensive set of regions that exhibit mapping sensitivity with the presence of the repeat-rich targets.

Categories: Journal Articles

Enriching CRISPR-Cas9 targeted cells by co-targeting the HPRT gene

Mon, 11/16/2015 - 01:17

The CRISPR-Cas9 system uses guide RNAs to direct the Cas9 endonuclease to cleave target sequences. It can, in theory, target essentially any sequence in a genome, but the efficiency of the predicted guide RNAs varies dramatically. If no targeted cells are obtained, it is also difficult to know why the experiment fails. We have developed a transient transfection based method to enrich successfully targeted cells by co-targeting the hypoxanthine phosphoribosyltransferase (HPRT) gene. Cells are transfected with two guide RNAs that target respectively HPRT and the gene of interest. HPRT targeted cells are selected by resistance to 6-thioguanine (6-TG) and then examined for potential alterations to the gene targeted by the co-transfected guide RNA. Alterations of many genes, such as AAVS1, Exo1 and Trex1, are highly enriched in the 6-TG resistant cells. This method works in both HCT116 cells and U2OS cells and can easily be scaled up to process multiple guide RNAs. When co-targeting fails, it is straightforward to determine whether the target gene is essential or the guide RNA is ineffective. HPRT co-targeting thus provides a simple, efficient and scalable way to enrich gene targeting events and to identify the cause of failure.

Categories: Journal Articles

A new method to prevent carry-over contaminations in two-step PCR NGS library preparations

Mon, 11/16/2015 - 01:17

Two-step PCR procedures are an efficient and well established way to generate amplicon libraries for NGS sequencing. However, there is a high risk of cross-contamination by carry-over of amplicons from first to second amplification rounds, potentially leading to severe misinterpretation of results. Here we describe a new method able to prevent and/or to identify carry-over contaminations by introducing the K-box, a series of three synergistically acting short sequence elements. Our K-boxes are composed of (i) K1 sequences for suppression of contaminations, (ii) K2 sequences for detection of possible residual contaminations and (iii) S sequences acting as separators to avoid amplification bias. In order to demonstrate the effectiveness of our method we analyzed two-step PCR NGS libraries derived from a multiplex PCR system for detection of T-cell receptor beta gene rearrangements. We used this system since it is of high clinical relevance and may be affected by very low amounts of contaminations. Spike-in contaminations are effectively blocked by the K-box even at high rates as demonstrated by ultra-deep sequencing of the amplicons. Thus, we recommend implementation of the K-box in two-step PCR-based NGS systems for research and diagnostic applications demanding high sensitivity and accuracy.

Categories: Journal Articles

Count ratio model reveals bias affecting NGS fold changes

Mon, 11/16/2015 - 01:17

Various biases affect high-throughput sequencing read counts. Contrary to the general assumption, we show that bias does not always cancel out when fold changes are computed and that bias affects more than 20% of genes that are called differentially regulated in RNA-seq experiments with drastic effects on subsequent biological interpretation. Here, we propose a novel approach to estimate fold changes. Our method is based on a probabilistic model that directly incorporates count ratios instead of read counts. It provides a theoretical foundation for pseudo-counts and can be used to estimate fold change credible intervals as well as normalization factors that outperform currently used normalization methods. We show that fold change estimates are significantly improved by our method by comparing RNA-seq derived fold changes to qPCR data from the MAQC/SEQC project as a reference and analyzing random barcoded sequencing data. Our software implementation is freely available from the project website http://www.bio.ifi.lmu.de/software/lfc.

Categories: Journal Articles

STAR3D: a stack-based RNA 3D structural alignment tool

Mon, 11/16/2015 - 01:17

The various roles of versatile non-coding RNAs typically require the attainment of complex high-order structures. Therefore, comparing the 3D structures of RNA molecules can yield in-depth understanding of their functional conservation and evolutionary history. Recently, many powerful tools have been developed to align RNA 3D structures. Although some methods rely on both backbone conformations and base pairing interactions, none of them consider the entire hierarchical formation of the RNA secondary structure. One of the major issues is that directly applying the algorithms of matching 2D structures to the 3D coordinates is particularly time-consuming. In this article, we propose a novel RNA 3D structural alignment tool, STAR3D, to take into full account the 2D relations between stacks without the complicated comparison of secondary structures. First, the 3D conserved stacks in the inputs are identified and then combined into a tree-like consensus. Afterward, the loop regions are compared one-to-one in accordance with their relative positions in the consensus tree. The experimental results show that the prediction of STAR3D is more accurate for both non-homologous and homologous RNAs than other state-of-the-art tools with shorter running time.

Categories: Journal Articles

A framework for improving microRNA prediction in non-human genomes

Mon, 11/16/2015 - 01:17

The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in non-human genomes. Considering the ratio of true miRNA sequences to pseudo-miRNA sequences is on the order of 1:1000, such low specificity prevents the application of most existing tools to non-human genomes, as the number of false positives overwhelms the true predictions. We here introduce a framework (SMIRP) for creating species-specific miRNA prediction systems, leveraging sequence conservation and phylogenetic distance information. Substantial improvements in specificity and precision are obtained for four non-human test species when our framework is applied to three different prediction systems representing two types of classifiers (support vector machine and Random Forest), based on three different feature sets, with both human-specific and taxon-wide training data. The SMIRP framework is potentially applicable to all miRNA prediction systems and we expect substantial improvement in precision and specificity, while sustaining sensitivity, independent of the machine learning technique chosen.

Categories: Journal Articles

Selection of 2'-deoxy-2'-fluoroarabinonucleotide (FANA) aptamers that bind HIV-1 reverse transcriptase with picomolar affinity

Mon, 11/16/2015 - 01:17

Using a Systematic Evolution of Ligands by Exponential Enrichment (SELEX) protocol capable of selecting xeno-nucleic acid (XNA) aptamers, a 2'-deoxy-2'-fluoroarabinonucleotide (FANA) aptamer (referred to as FA1) to HIV-1 reverse transcriptase (HIV-1 RT) was selected. FA1 bound HIV-1 RT with KD,app values in the low pM range under different ionic conditions. Comparisons to published HIV-1 RT RNA and DNA aptamers indicated that FA1 bound at least as well as these aptamers. FA1 contained a 20 nucleotide 5' DNA sequence followed by a 57 nucleotide region of FANA nucleotides. Removal of the fourteen 5' DNA nucleotides did not affect binding. FA1's predicted structure was composed of four stems and four loops. All stem nucleotides could be modified to G-C base pairs (14 total changes) with a small effect on binding. Eliminating or altering most loop sequences reduced or abolished tight binding. Overall, results suggested that the structure and the sequence of FA1 were important for binding. FA1 showed strong inhibition of HIV-1 RT in extension assays while no specific binding to avian myeloblastosis or Moloney murine leukemia RTs was detected. A complete DNA version of FA1 showed low binding to HIV-1 RT, emphasizing the unique properties of FANA in HIV-1 RT binding.

Categories: Journal Articles

The players may change but the game remains: network analyses of ruminal microbiomes suggest taxonomic differences mask functional similarity

Mon, 11/16/2015 - 01:17

By mapping translated metagenomic reads to a microbial metabolic network, we show that ruminal ecosystems that are rather dissimilar in their taxonomy can be considerably more similar at the metabolic network level. Using a new network bi-partition approach for linking the microbial network to a bovine metabolic network, we observe that these ruminal metabolic networks exhibit properties consistent with distinct metabolic communities producing similar outputs from common inputs. For instance, the closer in network space that a microbial reaction is to a reaction found in the host, the lower will be the variability of its enzyme copy number across hosts. Similarly, these microbial enzymes that are nearby to host nodes are also higher in copy number than are more distant enzymes. Collectively, these results demonstrate a widely expected pattern that, to our knowledge, has not been explicitly demonstrated in microbial communities: namely that there can exist different community metabolic networks that have the same metabolic inputs and outputs but differ in their internal structure.

Categories: Journal Articles

Assembly and analysis of eukaryotic Argonaute-RNA complexes in microRNA-target recognition

Mon, 11/16/2015 - 01:17

Experimental studies have uncovered a variety of microRNA (miRNA)–target duplex structures that include perfect, imperfect and seedless duplexes. However, non-canonical binding modes from imperfect/seedless duplexes are not well predicted by computational approaches, which rely primarily on sequence and secondary structural features, nor have their tertiary structures been characterized because solved structures to date are limited to near perfect, straight duplexes in Argonautes (Agos). Here, we use structural modeling to examine the role of Ago dynamics in assembling viable eukaryotic miRNA-induced silencing complexes (miRISCs). We show that combinations of low-frequency, global modes of motion of Ago domains are required to accommodate RNA duplexes in model human and C. elegans Ago structures. Models of viable miRISCs imply that Ago adopts variable conformations at distinct target sites that generate distorted, imperfect miRNA-target duplexes. Ago's ability to accommodate a duplex is dependent on the region where structural distortions occur: distortions in solvent-exposed seed and 3'-end regions are less likely to produce steric clashes than those in the central duplex region. Energetic analyses of assembled miRISCs indicate that target recognition is also driven by favorable Ago-duplex interactions. Such structural insights into Ago loading and target recognition mechanisms may provide a more accurate assessment of miRNA function.

Categories: Journal Articles

Hairpins participating in folding of human telomeric sequence quadruplexes studied by standard and T-REMD simulations

Mon, 11/16/2015 - 01:17

DNA G-hairpins are potential key structures participating in folding of human telomeric guanine quadruplexes (GQ). We examined their properties by standard MD simulations starting from the folded state and long T-REMD starting from the unfolded state, accumulating ~130 μs of atomistic simulations. Antiparallel G-hairpins should spontaneously form in all stages of the folding to support lateral and diagonal loops, with sub-μs scale rearrangements between them. We found no clear predisposition for direct folding into specific GQ topologies with specific syn/anti patterns. Our key prediction stemming from the T-REMD is that an ideal unfolded ensemble of the full GQ sequence populates all 4096 syn/anti combinations of its four G-stretches. The simulations can propose idealized folding pathways but we explain that such few-state pathways may be misleading. In the context of the available experimental data, the simulations strongly suggest that the GQ folding could be best understood by the kinetic partitioning mechanism with a set of deep competing minima on the folding landscape, with only a small fraction of molecules directly folding to the native fold. The landscape should further include non-specific collapse processes where the molecules move via diffusion and consecutive random rare transitions, which could, e.g. structure the propeller loops.

Categories: Journal Articles

Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

Mon, 11/16/2015 - 01:17

Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/.

Categories: Journal Articles

Perturbations of PIP3 signalling trigger a global remodelling of mRNA landscape and reveal a transcriptional feedback loop

Mon, 11/16/2015 - 01:17

PIP3 is synthesized by the Class I PI3Ks and regulates complex cell responses, such as growth and migration. Signals that drive long-term reshaping of cell phenotypes are difficult to resolve because of complex feedback networks that operate over extended times. PIP3-dependent modulation of mRNA accumulation is clearly important in this process but is poorly understood. We have quantified the genome-wide mRNA-landscape of non-transformed, breast epithelium-derived MCF10a cells and its response to acute regulation by EGF, in the presence or absence of a PI3Kα inhibitor, compare it to chronic activation of PI3K signalling by cancer-relevant mutations (isogenic cells expressing an oncomutant PI3Kα allele or lacking the PIP3-phosphatase/tumour-suppressor, PTEN). Our results show that whilst many mRNAs are changed by long-term genetic perturbation of PIP3 signalling (‘butterfly effect’), a much smaller number do so in a coherent fashion with the different PIP3 perturbations. This suggests a subset of more directly regulated mRNAs. We show that mRNAs respond differently to given aspects of PIP3 regulation. Some PIP3-sensitive mRNAs encode PI3K pathway components, thus suggesting a transcriptional feedback loop. We identify the transcription factor binding motifs SRF and PRDM1 as important regulators of PIP3-sensitive mRNAs involved in cell movement.

Categories: Journal Articles

Epigenetic program and transcription factor circuitry of dendritic cell development

Mon, 11/16/2015 - 01:17

Dendritic cells (DC) are professional antigen presenting cells that develop from hematopoietic stem cells through successive steps of lineage commitment and differentiation. Multipotent progenitors (MPP) are committed to DC restricted common DC progenitors (CDP), which differentiate into specific DC subsets, classical DC (cDC) and plasmacytoid DC (pDC). To determine epigenetic states and regulatory circuitries during DC differentiation, we measured consecutive changes of genome-wide gene expression, histone modification and transcription factor occupancy during the sequel MPP-CDP-cDC/pDC. Specific histone marks in CDP reveal a DC-primed epigenetic signature, which is maintained and reinforced during DC differentiation. Epigenetic marks and transcription factor PU.1 occupancy increasingly coincide upon DC differentiation. By integrating PU.1 occupancy and gene expression we devised a transcription factor regulatory circuitry for DC commitment and subset specification. The circuitry provides the transcription factor hierarchy that drives the sequel MPP-CDP-cDC/pDC, including Irf4, Irf8, Tcf4, Spib and Stat factors. The circuitry also includes feedback loops inferred for individual or multiple factors, which stabilize distinct stages of DC development and DC subsets. In summary, here we describe the basic regulatory circuitry of transcription factors that drives DC development.

Categories: Journal Articles

H3K23me2 is a new heterochromatic mark in Caenorhabditis elegans

Mon, 11/16/2015 - 01:17

Genome-wide analyses in Caenorhabditis elegans show that post-translational modifications (PTMs) of histones are evolutionary conserved and distributed along functionally distinct genomic domains. However, a global profile of PTMs and their co-occurrence on the same histone tail has not been described in this organism. We used mass spectrometry based middle-down proteomics to analyze histone H3 N-terminal tails from C. elegans embryos for the presence, the relative abundance and the potential cross-talk of co-existing PTMs. This analysis highlighted that the lysine 23 of histone H3 (H3K23) is extensively modified by methylation and that tri-methylated H3K9 (H3K9me3) is exclusively detected on histone tails with di-methylated H3K23 (H3K23me2). Chromatin immunoprecipitation approaches revealed a positive correlation between H3K23me2 and repressive marks. By immunofluorescence analyses, H3K23me2 appears differentially regulated in germ and somatic cells, in part by the action of the histone demethylase JMJD-1.2. H3K23me2 is enriched in heterochromatic regions, localizing in H3K9me3 and heterochromatin protein like-1 (HPL-1)-positive foci. Biochemical analyses indicated that HPL-1 binds to H3K23me2 and interacts with a conserved CoREST repressive complex. Thus, our study suggests that H3K23me2 defines repressive domains and contributes to organizing the genome in distinct heterochromatic regions during embryogenesis.

Categories: Journal Articles

FUS/TLS contributes to replication-dependent histone gene expression by interaction with U7 snRNPs and histone-specific transcription factors

Mon, 11/16/2015 - 01:17

Replication-dependent histone genes are up-regulated during the G1/S phase transition to meet the requirement for histones to package the newly synthesized DNA. In mammalian cells, this increment is achieved by enhanced transcription and 3' end processing. The non-polyadenylated histone mRNA 3' ends are generated by a unique mechanism involving the U7 small ribonucleoprotein (U7 snRNP). By using affinity purification methods to enrich U7 snRNA, we identified FUS/TLS as a novel U7 snRNP interacting protein. Both U7 snRNA and histone transcripts can be precipitated by FUS antibodies predominantly in the S phase of the cell cycle. Moreover, FUS depletion leads to decreased levels of correctly processed histone mRNAs and increased levels of extended transcripts. Interestingly, FUS antibodies also co-immunoprecipitate histone transcriptional activator NPAT and transcriptional repressor hnRNP UL1 in different phases of the cell cycle. We further show that FUS binds to histone genes in S phase, promotes the recruitment of RNA polymerase II and is important for the activity of histone gene promoters. Thus, FUS may serve as a linking factor that positively regulates histone gene transcription and 3' end processing by interacting with the U7 snRNP and other factors involved in replication-dependent histone gene expression.

Categories: Journal Articles

GC skew is a conserved property of unmethylated CpG island promoters across vertebrates

Mon, 11/16/2015 - 01:17

GC skew is a measure of the strand asymmetry in the distribution of guanines and cytosines. GC skew favors R-loops, a type of three stranded nucleic acid structures that form upon annealing of an RNA strand to one strand of DNA, creating a persistent RNA:DNA hybrid. Previous studies show that GC skew is prevalent at thousands of human CpG island (CGI) promoters and transcription termination regions, which correspond to hotspots of R-loop formation. Here, we investigated the conservation of GC skew patterns in 60 sequenced chordates genomes. We report that GC skew is a conserved sequence characteristic of the CGI promoter class in vertebrates. Furthermore, we reveal that promoter GC skew peaks at the exon 1/ intron1 junction and that it is highly correlated with gene age and CGI promoter strength. Our data also show that GC skew is predictive of unmethylated CGI promoters in a range of vertebrate species and that it imparts significant DNA hypomethylation for promoters with intermediate CpG densities. Finally, we observed that terminal GC skew is conserved for a subset of vertebrate genes that tend to be located significantly closer to their downstream neighbors, consistent with a role for R-loop formation in transcription termination.

Categories: Journal Articles