Nucleic Acids Research
Dissecting the target specificity of RNase H recruiting oligonucleotides using massively parallel reporter analysis of short RNA motifs
Processing and post-transcriptional regulation of RNA often depend on binding of regulatory molecules to short motifs in RNA. The effects of such interactions are difficult to study, because most regulatory molecules recognize partially degenerate RNA motifs, embedded in a sequence context specific for each RNA. Here, we describe Library Sequencing (LibSeq), an accurate massively parallel reporter method for completely characterizing the regulatory potential of thousands of short RNA sequences in a specific context. By sequencing cDNA derived from a plasmid library expressing identical reporter genes except for a degenerate 7mer subsequence in the 3'UTR, the regulatory effects of each 7mer can be determined. We show that LibSeq identifies regulatory motifs used by RNA-binding proteins and microRNAs. We furthermore apply the method to cells transfected with RNase H recruiting oligonucleotides to obtain quantitative information for >15000 potential target sequences in parallel. These comprehensive datasets provide insights into the specificity requirements of RNase H and allow a specificity measure to be calculated for each tested oligonucleotide. Moreover, we show that inclusion of chemical modifications in the central part of an RNase H recruiting oligonucleotide can increase its sequence-specificity.
Alternative splicing is an important and ancient feature of eukaryotic gene structure, the existence of which has likely facilitated eukaryotic proteome expansions. Here, we have used intron lariat sequencing to generate a comprehensive profile of splicing events in Schizosaccharomyces pombe, amongst the simplest organisms that possess mammalian-like splice site degeneracy. We reveal an unprecedented level of alternative splicing, including alternative splice site selection for over half of all annotated introns, hundreds of novel exon-skipping events, and thousands of novel introns. Moreover, the frequency of these events is far higher than previous estimates, with alternative splice sites on average activated at ~3% the rate of canonical sites. Although a subset of alternative sites are conserved in related species, implying functional potential, the majority are not detectably conserved. Interestingly, the rate of aberrant splicing is inversely related to expression level, with lowly expressed genes more prone to erroneous splicing. Although we validate many events with RNAseq, the proportion of alternative splicing discovered with lariat sequencing is far greater, a difference we attribute to preferential decay of aberrantly spliced transcripts. Together, these data suggest the spliceosome possesses far lower fidelity than previously appreciated, highlighting the potential contributions of alternative splicing in generating novel gene structures.
Paradoxical suppression of small RNA activity at high Hfq concentrations due to random-order binding
Small RNAs (sRNAs) are important regulators of gene expression during bacterial stress and pathogenesis. sRNAs act by forming duplexes with mRNAs to alter their translation and degradation. In some bacteria, duplex formation is mediated by the Hfq protein, which can bind the sRNA and mRNA in each pair in a random order. Here we investigate the consequences of this random-order binding and experimentally demonstrate that it can counterintuitively cause high Hfq concentrations to suppress rather than promote sRNA activity in Escherichia coli. As a result, maximum sRNA activity occurs when the Hfq concentration is neither too low nor too high relative to the sRNA and mRNA concentrations (‘Hfq set-point’). We further show with models and experiments that random-order binding combined with the formation of a dead-end mRNA–Hfq complex causes high concentrations of an mRNA to inhibit its own duplex formation by sequestering Hfq. In such cases, maximum sRNA activity requires an optimal mRNA concentration (‘mRNA set-point’) as well as an optimal Hfq concentration. The Hfq and mRNA set-points generate novel regulatory properties that can be harnessed by native and synthetic gene circuits to provide greater control over sRNA activity, generate non-monotonic responses and enhance the robustness of expression.
Genome-wide analysis of YB-1-RNA interactions reveals a novel role of YB-1 in miRNA processing in glioblastoma multiforme
Altered miRNA expression is believed to play a crucial role in a variety of human cancers; however, the mechanisms leading to the dysregulation of miRNA expression remain elusive. In this study, we report that the human Y box-binding protein (YB-1), a major mRNA packaging protein, is a novel modulator of miRNA processing in glioblastoma multiforme (GBM). Using individual nucleotide-resolution crosslinking immunoprecipitation coupled to deep sequencing (iCLIP-seq), we performed the first genome-wide analysis of the in vivo YB-1-RNA interactions and found that YB-1 preferentially recognizes a UYAUC consensus motif and binds to the majority of coding gene transcripts including pre-mRNAs and mature mRNAs. Remarkably, our data show that YB-1 also binds extensively to the terminal loop region of pri-/pre-miR-29b-2 and regulates the biogenesis of miR-29b-2 by blocking the recruitment of microprocessor and Dicer to its precursors. Furthermore, we show that down-regulation of miR-29b by YB-1, which is up-regulated in GBM, is important for cell proliferation. Together, our findings reveal a novel function of YB-1 in regulating non-coding RNA expression, which has important implications in tumorigenesis.
The conserved 3'X terminal domain of hepatitis C virus genomic RNA forms a two-stem structure that promotes viral RNA dimerization
The 3'X domain of hepatitis C virus is a strongly conserved structure located at the 3' terminus of the viral genomic RNA. This domain modulates the replication and translation processes of the virus in conjunction with an upstream 5BSL3.2 stem–loop, and contains a palindromic sequence that facilitates RNA dimerization. Based on nuclear magnetic resonance spectroscopy and gel electrophoresis, we report here that domain 3'X adopts a structure composed of two stem–loops, and not three hairpins or a mixture of folds, as previously proposed. This structure exposes unpaired terminal nucleotides after a double-helical stem and palindromic bases in an apical loop, favoring genomic RNA replication and self-association. At higher ionic strength the domain forms homodimers comprising an intermolecular duplex of 110 nucleotides. The 3'X sequences can alternatively form heterodimers with 5BSL3.2. This contact, reported to favor translation, likely involves local melting of one of the 3'X stem–loops.
The FMRP/GRK4 mRNA interaction uncovers a new mode of binding of the Fragile X mental retardation protein in cerebellum
Fragile X syndrome (FXS), the most common form of inherited intellectual disability, is caused by the silencing of the FMR1 gene encoding an RNA-binding protein (FMRP) mainly involved in translational control. We characterized the interaction between FMRP and the mRNA of GRK4, a member of the guanine nucleotide-binding protein (G protein)-coupled receptor kinase super-family, both in vitro and in vivo. While the mRNA level of GRK4 is unchanged in the absence or in the presence of FMRP in different regions of the brain, GRK4 protein level is increased in Fmr1-null cerebellum, suggesting that FMRP negatively modulates the expression of GRK4 at the translational level in this brain region. The C-terminal region of FMRP interacts with a domain of GRK4 mRNA, that we called G4RIF, that is folded in four stem loops. The SL1 stem loop of G4RIF is protected by FMRP and is part of the S1/S2 sub-domain that directs translation repression of a reporter mRNA by FMRP. These data confirm the role of the G4RIF/FMRP complex in translational regulation. Considering the role of GRK4 in GABAB receptors desensitization, our results suggest that an increased GRK4 levels in FXS might contribute to cerebellum-dependent phenotypes through a deregulated desensitization of GABAB receptors.
Hexameric helicases are processive DNA unwinding machines but how they engage with a replication fork during unwinding is unknown. Using electron microscopy and single particle analysis we determined structures of the intact hexameric helicase E1 from papillomavirus and two complexes of E1 bound to a DNA replication fork end-labelled with protein tags. By labelling a DNA replication fork with streptavidin (dsDNA end) and Fab (5' ssDNA) we located the positions of these labels on the helicase surface, showing that at least 10 bp of dsDNA enter the E1 helicase via a side tunnel. In the currently accepted ‘steric exclusion’ model for dsDNA unwinding, the active 3' ssDNA strand is pulled through a central tunnel of the helicase motor domain as the dsDNA strands are wedged apart outside the protein assembly. Our structural observations together with nuclease footprinting assays indicate otherwise: strand separation is taking place inside E1 in a chamber above the helicase domain and the 5' passive ssDNA strands exits the assembly through a separate tunnel opposite to the dsDNA entry point. Our data therefore suggest an alternative to the current general model for DNA unwinding by hexameric helicases.
Replicative helicases are essential ATPases that unwind DNA to initiate chromosomal replication. While bacterial replicative DnaB helicases are hexameric, Helicobacter pylori DnaB (HpDnaB) was found to form double hexamers, similar to some archaeal and eukaryotic replicative helicases. Here we present a structural and functional analysis of HpDnaB protein during primosome formation. The crystal structure of the HpDnaB at 6.7 Å resolution reveals a dodecameric organization consisting of two hexamers assembled via their N-terminal rings in a stack-twisted mode. Using fluorescence anisotropy we show that HpDnaB dodecamer interacts with single-stranded DNA in the presence of ATP but has a low DNA unwinding activity. Multi-angle light scattering and small angle X-ray scattering demonstrate that interaction with the DnaG primase helicase-binding domain dissociates the helicase dodecamer into single ringed primosomes. Functional assays on the proteins and associated complexes indicate that these single ringed primosomes are the most active form of the helicase for ATP hydrolysis, DNA binding and unwinding. These findings shed light onto an activation mechanism of HpDnaB by the primase that might be relevant in other bacteria and possibly other organisms exploiting dodecameric helicases for DNA replication.
Microcalorimetric studies of DNA duplexes and their component single strands showed that association enthalpies of unfolded complementary strands into completely folded duplexes increase linearly with temperature and do not depend on salt concentration, i.e. duplex formation results in a constant heat capacity decrement, identical for CG and AT pairs. Although duplex thermostability increases with CG content, the enthalpic and entropic contributions of an AT pair to duplex formation exceed that of a CG pair when compared at the same temperature. The reduced contribution of AT pairs to duplex stabilization comes not from their lower enthalpy, as previously supposed, but from their larger entropy contribution. This larger enthalpy and particularly the greater entropy results from water fixed by the AT pair in the minor groove. As the increased entropy of an AT pair exceeds that of melting ice, the water molecule fixed by this pair must affect those of its neighbors. Water in the minor groove is, thus, orchestrated by the arrangement of AT groups, i.e. is context dependent. In contrast, water hydrating exposed nonpolar surfaces of bases is responsible for the heat capacity increment on dissociation and, therefore, for the temperature dependence of all thermodynamic characteristics of the double helix.
Solution structure of a DNA quadruplex containing ALS and FTD related GGGGCC repeat stabilized by 8-bromodeoxyguanosine substitution
A prolonged expansion of GGGGCC repeat within non-coding region of C9orf72 gene has been identified as the most common cause of familial amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD), which are devastating neurodegenerative disorders. Formation of unusual secondary structures within expanded GGGGCC repeat, including DNA and RNA G-quadruplexes and R-loops was proposed to drive ALS and FTD pathogenesis. Initial NMR investigation on DNA oligonucleotides with four repeat units as the shortest model with the ability to form an unimolecular G-quadruplex indicated their folding into multiple G-quadruplex structures in the presence of K+ ions. Single dG to 8Br-dG substitution at position 21 in oligonucleotide d[(G4C2)3G4] and careful optimization of folding conditions enabled formation of mostly a single G-quadruplex species, which enabled determination of a high-resolution structure with NMR. G-quadruplex structure adopted by d[(G4C2)3GGBrGG] is composed of four G-quartets, which are connected by three edgewise C-C loops. All four strands adopt antiparallel orientation to one another and have alternating syn-anti progression of glycosidic conformation of guanine residues. One of the cytosines in every loop is stacked upon the G-quartet contributing to a very compact and stable structure.
Structural basis for selective targeting of leishmanial ribosomes: aminoglycoside derivatives as promising therapeutics
Leishmaniasis comprises an array of diseases caused by pathogenic species of Leishmania, resulting in a spectrum of mild to life-threatening pathologies. Currently available therapies for leishmaniasis include a limited selection of drugs. This coupled with the rather fast emergence of parasite resistance, presents a dire public health concern. Paromomycin (PAR), a broad-spectrum aminoglycoside antibiotic, has been shown in recent years to be highly efficient in treating visceral leishmaniasis (VL)—the life-threatening form of the disease. While much focus has been given to exploration of PAR activities in bacteria, its mechanism of action in Leishmania has received relatively little scrutiny and has yet to be fully deciphered. In the present study we present an X-ray structure of PAR bound to rRNA model mimicking its leishmanial binding target, the ribosomal A-site. We also evaluate PAR inhibitory actions on leishmanial growth and ribosome function, as well as effects on auditory sensory cells, by comparing several structurally related natural and synthetic aminoglycoside derivatives. The results provide insights into the structural elements important for aminoglycoside inhibitory activities and selectivity for leishmanial cytosolic ribosomes, highlighting a novel synthetic derivative, compound 3, as a prospective therapeutic candidate for the treatment of VL.
LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations
In cancer research, background models for mutation rates have been extensively calibrated in coding regions, leading to the identification of many driver genes, recurrently mutated more than expected. Noncoding regions are also associated with disease; however, background models for them have not been investigated in as much detail. This is partially due to limited noncoding functional annotation. Also, great mutation heterogeneity and potential correlations between neighboring sites give rise to substantial overdispersion in mutation count, resulting in problematic background rate estimation. Here, we address these issues with a new computational framework called LARVA. It integrates variants with a comprehensive set of noncoding functional elements, modeling the mutation counts of the elements with a β-binomial distribution to handle overdispersion. LARVA, moreover, uses regional genomic features such as replication timing to better estimate local mutation rates and mutational hotspots. We demonstrate LARVA's effectiveness on 760 whole-genome tumor sequences, showing that it identifies well-known noncoding drivers, such as mutations in the TERT promoter. Furthermore, LARVA highlights several novel highly mutated regulatory sites that could potentially be noncoding drivers. We make LARVA available as a software tool and release our highly mutated annotations as an online resource (larva.gersteinlab.org).
A key aspect of RNA secondary structure prediction is the identification of novel functional elements. This is a challenging task because these elements typically are embedded in longer transcripts where the borders between the element and flanking regions have to be defined. The flanking sequences impact the folding of the functional elements both at the level of computational analyses and when the element is extracted as a transcript for experimental analysis. Here, we analyze how different flanking region lengths impact folding into a constrained structure by computing probabilities of folding for different sizes of flanking regions. Our method, RNAcop (RNA context optimization by probability), is tested on known and de novo predicted structures. In vitro experiments support the computational analysis and suggest that for a number of structures, choosing proper lengths of flanking regions is critical. RNAcop is available as web server and stand-alone software via http://rth.dk/resources/rnacop.
Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes
Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs.
In Paramecium, the regeneration of a functional somatic genome at each sexual event relies on the elimination of thousands of germline DNA sequences, known as Internal Eliminated Sequences (IESs), from the zygotic nuclear DNA. Here, we provide evidence that IESs’ length and sub-terminal bases jointly modulate IES excision by affecting DNA conformation in P. tetraurelia. Our study reveals an excess of complementary base pairing between IESs’ sub-terminal and contiguous sites, suggesting that IESs may form DNA loops prior to cleavage. The degree of complementary base pairing between IESs’ sub-terminal sites (termed Cin-score) is positively associated with IES length and is shaped by natural selection. Moreover, it escalates abruptly when IES length exceeds 45 nucleotides (nt), indicating that only sufficiently large IESs may form loops. Finally, we find that IESs smaller than 46 nt are favored targets of the cellular surveillance systems, presumably because of their relatively inefficient excision. Our findings extend the repertoire of cis-acting determinants for IES recognition/excision and provide unprecedented insights into the distinct selective pressures that operate on IESs and somatic DNA regions. This information potentially moves current models of IES evolution and of mechanisms of IES recognition/excision forward.
Cross-talk between competitive endogenous RNAs (ceRNAs) through shared miRNAs represents a novel layer of gene regulation that plays important roles in the physiology and development of cancers. However, a global view of their system-level properties across various types of cancers is still unknown. Here, we constructed the mRNA related ceRNA–ceRNA interaction landscape across 20 cancer types by systematically analyzing molecular profiles of 5203 tumors and miRNA regulations. Our study highlights the conserved features shared by pan-cancer and higher similarity within similar origin cell type. Moreover, a core ceRNA network was identified. Function analysis identified a common theme of cancer hallmarks, however they exhibit phenotype-specific connectivity patterns. Besides, we found a marked rewiring in the ceRNA program between various cancers, and further revealed conserved and rewired network ceRNA hubs in each cancer, which were tensely competitive interactions to constitute conserved and cancer-specific modules. By providing mechanistic linkage between known cancer miRNAs, their mediated ceRNA–ceRNA interactions, and the associations with known cancer hallmarks, the inferred cancer ceRNA–ceRNA interaction landscape will serve as a powerful public resource for further biological discoveries of tumorigenesis.
The transcription factors SOX9 and SOX5/SOX6 cooperate genome-wide through super-enhancers to drive chondrogenesis
SOX9 is a transcriptional activator required for chondrogenesis, and SOX5 and SOX6 are closely related DNA-binding proteins that critically enhance its function. We use here genome-wide approaches to gain novel insights into the full spectrum of the target genes and modes of action of this chondrogenic trio. Using the RCS cell line as a faithful model for proliferating/early prehypertrophic growth plate chondrocytes, we uncover that SOX6 and SOX9 bind thousands of genomic sites, frequently and most efficiently near each other. SOX9 recognizes pairs of inverted SOX motifs, whereas SOX6 favors pairs of tandem SOX motifs. The SOX proteins primarily target enhancers. While binding to a small fraction of typical enhancers, they bind multiple sites on almost all super-enhancers (SEs) present in RCS cells. These SEs are predominantly linked to cartilage-specific genes. The SOX proteins effectively work together to activate these SEs and are required for in vivo expression of their associated genes. These genes encode key regulatory factors, including the SOX trio proteins, and all essential cartilage extracellular matrix components. Chst11, Fgfr3, Runx2 and Runx3 are among many other newly identified SOX trio targets. SOX9 and SOX5/SOX6 thus cooperate genome-wide, primarily through SEs, to implement the growth plate chondrocyte differentiation program.