Nucleic Acids Research
The metagenomic paradigm allows for an understanding of the metabolic and functional potential of microbes in a community via a study of their proteins. The substrate for protein identification is either the set of individual nucleotide reads generated from metagenomic samples or the set of contig sequences produced by assembling these reads. However, a read-based strategy using reads generated by next-generation sequencing (NGS) technologies, results in an overwhelming majority of partial-length protein predictions. A nucleotide assembly-based strategy does not fare much better, as metagenomic assemblies are typically fragmented and also leave a large fraction of reads unassembled. Here, we present a method for reconstructing complete protein sequences directly from NGS metagenomic data. Our framework is based on a novel short peptide assembler (SPA) that assembles protein sequences from their constituent peptide fragments identified on short reads. The SPA algorithm is based on informed traversals of a de Bruijn graph, defined on an amino acid alphabet, to identify probable paths that correspond to proteins. Using large simulated and real metagenomic data sets, we show that our method outperforms the alternate approach of identifying genes on nucleotide sequence assemblies and generates longer protein sequences that can be more effectively analysed.
Studying complex biological processes such as cancer development, stem cell induction and transdifferentiation requires the modulation of multiple genes or pathways at one time in a single cell. Herein, we describe straightforward methods for rapid and efficient assembly of bacterial marker free multigene cassettes containing up to six complementary DNAs/short hairpin RNAs. We have termed this method RecWay assembly, as it makes use of both Cre recombinase and the commercially available Gateway cloning system. Further, because RecWay assembly uses truly modular components, it allows for the generation of randomly assembled multigene vector libraries. These multigene vectors are integratable, and later excisable, using the highly efficient piggyBac (PB) DNA transposon system. Moreover, we have dramatically improved the expression of stably integrated multigene vectors by incorporation of insulator elements to prevent promoter interference seen with multigene vectors. We demonstrate that insulated multigene PB transposons can stably integrate and faithfully express up to five fluorescent proteins and the puromycin-thymidine kinase resistance gene in vitro, with up to 70-fold higher gene expression compared with analogous uninsulated vectors. RecWay assembly of multigene transposon vectors allows for widely applicable modelling of highly complex biological processes and can be easily performed by other research laboratories.
The MASTER (methylation-assisted tailorable ends rational) ligation method for seamless DNA assembly
Techniques for assembly of designed DNA sequences are important for synthetic biology. So far, a few methods have been developed towards high-throughput seamless DNA assembly in vitro, including both the homologous sequences-based system and the type IIS-mediated system. Here, we describe a novel method designated ‘MASTER Ligation’, by which multiple DNA sequences can be seamlessly assembled through a simple and sequence-independent hierarchical procedure. The key restriction endonuclease used, MspJI, shares both type IIM and type IIS properties; thus, it only recognizes the methylation-specific 4-bp sites, mCNNR (R = A or G), and cuts DNA outside of the recognition sequences. This method was tested via successful assembly of either multiple polymerase chain reaction amplicons or restriction fragments of the actinorhodin biosynthetic cluster of Streptomyces coelicolor (~29 kb), which was further heterologously expressed in a fast-growing and moderately thermophilic strain, Streptomyces sp. 4F.
RIPSeeker: a statistical package for identifying protein-associated transcripts from RIP-seq experiments
RIP-seq has recently been developed to discover genome-wide RNA transcripts that interact with a protein or protein complex. RIP-seq is similar to both RNA-seq and ChIP-seq, but presents unique properties and challenges. Currently, no statistical tool is dedicated to RIP-seq analysis. We developed RIPSeeker (http://www.bioconductor.org/packages/2.12/bioc/html/RIPSeeker.html), a free open-source Bioconductor/R package for de novo RIP peak predictions based on HMM. To demonstrate the utility of the software package, we applied RIPSeeker and six other published programs to three independent RIP-seq datasets and two PAR-CLIP datasets corresponding to six distinct RNA-binding proteins. Based on receiver operating curves, RIPSeeker demonstrates superior sensitivity and specificity in discriminating high-confidence peaks that are consistently agreed on among a majority of the comparison methods, and dominated 9 of the 12 evaluations, averaging 80% area under the curve. The peaks from RIPSeeker are further confirmed based on their significant enrichment for biologically meaningful genomic elements, published sequence motifs and association with canonical transcripts known to interact with the proteins examined. While RIPSeeker is specifically tailored for RIP-seq data analysis, it also provides a suite of bioinformatics tools integrated within a self-contained software package comprehensively addressing issues ranging from post-alignments’ processing to visualization and annotation.
Digital transcriptome analysis by next-generation sequencing discovers substantial mRNA variants. Variation in gene expression underlies many biological processes and holds a key to unravelling mechanism of common diseases. However, the current methods for construction of co-expression networks using overall gene expression are originally designed for microarray expression data, and they overlook a large number of variations in gene expressions. To use information on exon, genomic positional level and allele-specific expressions, we develop novel component-based methods, single and bivariate canonical correlation analysis, for construction of co-expression networks with RNA-seq data. To evaluate the performance of our methods for co-expression network inference with RNA-seq data, they are applied to lung squamous cell cancer expression data from TCGA database and our bipolar disorder and schizophrenia RNA-seq study. The preliminary results demonstrate that the co-expression networks constructed by canonical correlation analysis and RNA-seq data provide rich genetic and molecular information to gain insight into biological processes and disease mechanism. Our new methods substantially outperform the current statistical methods for co-expression network construction with microarray expression data or RNA-seq data based on overall gene expression levels.
A high-throughput and quantitative method to assess the mutagenic potential of translesion DNA synthesis
Cellular genomes are constantly damaged by endogenous and exogenous agents that covalently and structurally modify DNA to produce DNA lesions. Although most lesions are mended by various DNA repair pathways in vivo, a significant number of damage sites persist during genomic replication. Our understanding of the mutagenic outcomes derived from these unrepaired DNA lesions has been hindered by the low throughput of existing sequencing methods. Therefore, we have developed a cost-effective high-throughput short oligonucleotide sequencing assay that uses next-generation DNA sequencing technology for the assessment of the mutagenic profiles of translesion DNA synthesis catalyzed by any error-prone DNA polymerase. The vast amount of sequencing data produced were aligned and quantified by using our novel software. As an example, the high-throughput short oligonucleotide sequencing assay was used to analyze the types and frequencies of mutations upstream, downstream and at a site-specifically placed cis–syn thymidine–thymidine dimer generated individually by three lesion-bypass human Y-family DNA polymerases.
Reconstructing regulatory networks from the dynamic plasticity of gene expression by mutual information
The capacity of an organism to respond to its environment is facilitated by the environmentally induced alteration of gene and protein expression, i.e. expression plasticity. The reconstruction of gene regulatory networks based on expression plasticity can gain not only new insights into the causality of transcriptional and cellular processes but also the complex regulatory mechanisms that underlie biological function and adaptation. We describe an approach for network inference by integrating expression plasticity into Shannon’s mutual information. Beyond Pearson correlation, mutual information can capture non-linear dependencies and topology sparseness. The approach measures the network of dependencies of genes expressed in different environments, allowing the environment-induced plasticity of gene dependencies to be tested in unprecedented details. The approach is also able to characterize the extent to which the same genes trigger different amounts of expression in response to environmental changes. We demonstrated the usefulness of this approach through analysing gene expression data from a rabbit vein graft study that includes two distinct blood flow environments. The proposed approach provides a powerful tool for the modelling and analysis of dynamic regulatory networks using gene expression data from distinct environments.
Degradation of initiator tRNAMet by Xrn1/2 via its accumulation in the nucleus of heat-treated HeLa cells
Stress response mechanisms that modulate the dynamics of tRNA degradation and accumulation from the cytoplasm to the nucleus have been studied in yeast, the rat hepatoma and human cells. In the current study, we investigated tRNA degradation and accumulation in HeLa cells under various forms of stress. We found that initiator tRNAMet (tRNA(iMet)) was specifically degraded under heat stress. Two exonucleases, Xrn1 and Xrn2, are involved in the degradation of tRNA(iMet) in the cytoplasm and the nucleus, respectively. In addition to degradation, we observed accumulation of tRNA(iMet) in the nucleus. We also found that the mammalian target of rapamycin (mTOR), which regulates tRNA trafficking in yeast, is partially phosphorylated at Ser2448 in the presence of rapamycin and/or during heat stress. Our results suggest phosphorylation of mTOR may correlate with accumulation of tRNA(iMet) in heat-treated HeLa cells.
RNA elements directing in vivo assembly of the 7SK/MePCE/Larp7 transcriptional regulatory snRNP
Through controlling the nuclear level of active positive transcription elongation factor b (P-TEFb), the 7SK small nuclear RNA (snRNA) functions as a key regulator of RNA polymerase II transcription. Together with hexamethylene bisacetamide-inducible proteins 1/2 (HEXIM1/2), the 7SK snRNA sequesters P-TEFb into transcriptionally inactive ribonucleoprotein (RNP). In response to transcriptional stimulation, the 7SK/HEXIM/P-TEFb RNP releases P-TEFb to promote polymerase II-mediated messenger RNA synthesis. Besides transiently associating with HEXIM1/2 and P-TEFb, the 7SK snRNA stably interacts with the La-related protein 7 (Larp7) and the methylphosphate capping enzyme (MePCE). In this study, we used in vivo RNA–protein interaction assays to determine the sequence and structural elements of human 7SK snRNA directing assembly of the 7SK/MePCE/Larp7 core snRNP. MePCE interacts with the short 5'-terminal G1-U4/U106-G111 helix-tail motif and Larp7 binds to the 3'-terminal hairpin and the following U-rich tail of 7SK. The overall RNA structure and some particular nucleotides provide the information for specific binding of MePCE and Larp7. We also demonstrate that binding of Larp7 to 7SK is a prerequisite for in vivo recruitment of P-TEFb, indicating that besides providing stability for 7SK, Larp7 directly participates in P-TEFb regulation. Our results provide further explanation for the frequently observed link between Larp7 mutations and cancer development.
Cytoplasmic and nuclear quality control and turnover of single-stranded RNA modulate post-transcriptional gene silencing in plants
Eukaryotic RNA quality control (RQC) uses both endonucleolytic and exonucleolytic degradation to eliminate dysfunctional RNAs. In addition, endogenous and exogenous RNAs are degraded through post-transcriptional gene silencing (PTGS), which is triggered by the production of double-stranded (ds)RNAs and proceeds through short-interfering (si)RNA-directed ARGONAUTE-mediated endonucleolytic cleavage. Compromising cytoplasmic or nuclear 5'–3' exoribonuclease function enhances sense-transgene (S)-PTGS in Arabidopsis, suggesting that these pathways compete for similar RNA substrates. Here, we show that impairing nonsense-mediated decay, deadenylation or exosome activity enhanced S-PTGS, which requires host RNA-dependent RNA polymerase 6 (RDR6/SGS2/SDE1) and SUPPRESSOR OF GENE SILENCING 3 (SGS3) for the transformation of single-stranded RNA into dsRNA to trigger PTGS. However, these RQC mutations had no effect on inverted-repeat–PTGS, which directly produces hairpin dsRNA through transcription. Moreover, we show that these RQC factors are nuclear and cytoplasmic and are found in two RNA degradation foci in the cytoplasm: siRNA-bodies and processing-bodies. We propose a model of single-stranded RNA tug-of-war between RQC and S-PTGS that ensures the correct partitioning of RNA substrates among these RNA degradation pathways.
Gradual processing of the ITS1 from the nucleolus to the cytoplasm during synthesis of the human 18S rRNA
Defects in ribosome biogenesis trigger stress response pathways, which perturb cell proliferation and differentiation in several genetic diseases. In Diamond–Blackfan anemia (DBA), a congenital erythroblastopenia, mutations in ribosomal protein genes often interfere with the processing of the internal transcribed spacer 1 (ITS1), the mechanism of which remains elusive in human cells. Using loss-of-function experiments and extensive RNA analysis, we have defined the precise position of the endonucleolytic cleavage E in the ITS1, which generates the 18S-E intermediate, the last precursor to the 18S rRNA. Unexpectedly, this cleavage is followed by 3'–5' exonucleolytic trimming of the 18S-E precursor during nuclear export of the pre-40S particle, which sets a new mechanism for 18S rRNA formation clearly different from that established in yeast. In addition, cleavage at site E is also followed by 5'–3' exonucleolytic trimming of the ITS1 by exonuclease XRN2. Perturbation of this step on knockdown of the large subunit ribosomal protein RPL26, which was recently associated to DBA, reveals the putative role of a highly conserved cis-acting sequence in ITS1 processing. These data cast new light on the original mechanism of ITS1 elimination in human cells and provide a mechanistic framework to further study the interplay of DBA-linked ribosomal proteins in this process.
Electron transfer in DNA has been intensively studied to elucidate its biological roles and for applications in bottom-up DNA nanotechnology. Recently, mechanisms of electron transfer to DNA have been investigated; however, most of the systems designed are intramolecular. Here, we synthesized pyrene-conjugated pyrrole-imidazole polyamides (PPIs) to achieve sequence-specific electron injection into DNA in an intermolecular fashion. Electron injection from PPIs into DNA was detected using 5-bromouracil as an electron acceptor. Twelve different 5-bromouracil-containing oligomers were synthesized to examine the electron-injection ability of PPI. Product analysis demonstrated that the electron transfer from PPIs was localized in a range of 8 bp from the binding site of the PPIs. These results demonstrate that PPIs can be a useful tool for sequence-specific electron injection.
The structural reorganization of nanoscale DNA architectures is a fundamental aspect in dynamic DNA nanotechnology. Commonly, DNA nanoarchitectures are reorganized by means of toehold-expanded DNA sequences in a strand exchange process. Here we describe an unprecedented, toehold-free switching process that relies on pseudo-complementary peptide nucleic acid (pcPNA) by using a mechanism that involves double-strand invasion. The usefulness of this approach is demonstrated by application of these peptide nucleic acids (PNAs) as switches in a DNA rotaxane architecture. The monomers required for generating the pcPNA were obtained by an improved synthesis strategy and were incorporated into a PNA actuator sequence as well as into a short DNA strand that subsequently was integrated into the rotaxane architecture. Alternate addition of a DNA and PNA actuator sequence allowed the multiple reversible switching between a mobile rotaxane macrocycle and a stationary pseudorotaxane state. The switching occurs in an isothermal process at room temperature and is nearly quantitative in each switching step. pcPNAs can potentially be combined with light- and toehold-based switches, thus broadening the toolbox of orthogonal switching approaches for DNA architectures that open up new avenues in dynamic DNA nanotechnology.
G-quadruplexes represent a versatile sensing platform for the construction of label-free molecular detection assays owing to their diverse structures that can be selectively recognized by G-quadruplex-specific luminescent probes. In this Survey and Summary, we highlight recent examples of the application of the label-free strategy for the development of G-quadruplex-based luminescent detection platforms with a view towards the potential application of tetraplex structures in the design of DNA logic gates.
Our knowledge of prokaryotic defense systems has vastly expanded as the result of comparative genomic analysis, followed by experimental validation. This expansion is both quantitative, including the discovery of diverse new examples of known types of defense systems, such as restriction-modification or toxin-antitoxin systems, and qualitative, including the discovery of fundamentally new defense mechanisms, such as the CRISPR-Cas immunity system. Large-scale statistical analysis reveals that the distribution of different defense systems in bacterial and archaeal taxa is non-uniform, with four groups of organisms distinguishable with respect to the overall abundance and the balance between specific types of defense systems. The genes encoding defense system components in bacterial and archaea typically cluster in defense islands. In addition to genes encoding known defense systems, these islands contain numerous uncharacterized genes, which are candidates for new types of defense systems. The tight association of the genes encoding immunity systems and dormancy- or cell death-inducing defense systems in prokaryotic genomes suggests that these two major types of defense are functionally coupled, providing for effective protection at the population level.