Nucleic Acids Research
microRNAs and the evolution of complex multicellularity: identification of a large, diverse complement of microRNAs in the brown alga Ectocarpus
There is currently convincing evidence that microRNAs have evolved independently in at least six different eukaryotic lineages: animals, land plants, chlorophyte green algae, demosponges, slime molds and brown algae. MicroRNAs from different lineages are not homologous but some structural features are strongly conserved across the eukaryotic tree allowing the application of stringent criteria to identify novel microRNA loci. A large set of 63 microRNA families was identified in the brown alga Ectocarpus based on mapping of RNA-seq data and nine microRNAs were confirmed by northern blotting. The Ectocarpus microRNAs are highly diverse at the sequence level with few multi-gene families, and do not tend to occur in clusters but exhibit some highly conserved structural features such as the presence of a uracil at the first residue. No homologues of Ectocarpus microRNAs were found in other stramenopile genomes indicating that they emerged late in stramenopile evolution and are perhaps specific to the brown algae. The large number of microRNA loci in Ectocarpus is consistent with the developmental complexity of many brown algal species and supports a proposed link between the emergence and expansion of microRNA regulatory systems and the evolution of complex multicellularity.
Dumbbell-PCR: a method to quantify specific small RNA variants with a single nucleotide resolution at terminal sequences
Recent advances in next-generation sequencing technologies have revealed that cellular functional RNAs are not always expressed as single entities with fixed terminal sequences but as multiple isoforms bearing complex heterogeneity in both length and terminal sequences, such as isomiRs, the isoforms of microRNAs. Unraveling the biogenesis and biological significance of heterogenetic RNA expression requires distinctive analysis of each RNA variant. Here, we report the development of dumbbell PCR (Db-PCR), an efficient and convenient method to distinctively quantify a specific individual small RNA variant. In Db-PCR, 5'- and 3'-stem–loop adapters are specifically hybridized and ligated to the 5'- and 3'-ends of target RNAs, respectively, by T4 RNA ligase 2 (Rnl2). The resultant ligation products with ‘dumbbell-like’ structures are subsequently quantified by TaqMan RT-PCR. We confirmed that high specificity of Rnl2 ligation and TaqMan RT-PCR toward target RNAs assured both 5'- and 3'-terminal sequences of target RNAs with single nucleotide resolution so that Db-PCR specifically detected target RNAs but not their corresponding terminal variants. Db-PCR had broad applicability for the quantification of various small RNAs in different cell types, and the results were consistent with those from other quantification method. Therefore, Db-PCR provides a much-needed simple method for analyzing RNA terminal heterogeneity.
Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of training sets. We demonstrate that (i) the unsupervised training is robust with respect to the presence of transcripts assembly errors and (ii) the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting translation initiation sites in modelled as well as in assembled transcripts compares favourably to other existing methods.
Meta-analysis of gene expression has enabled numerous insights into biological systems, but current methods have several limitations. We developed a method to perform a meta-analysis using the elastic net, a powerful and versatile approach for classification and regression. To demonstrate the utility of our method, we conducted a meta-analysis of lung cancer gene expression based on publicly available data. Using 629 samples from five data sets, we trained a multinomial classifier to distinguish between four lung cancer subtypes. Our meta-analysis-derived classifier included 58 genes and achieved 91% accuracy on leave-one-study-out cross-validation and on three independent data sets. Our method makes meta-analysis of gene expression more systematic and expands the range of questions that a meta-analysis can be used to address. As the amount of publicly available gene expression data continues to grow, our method will be an effective tool to help distill these data into knowledge.
SplicePie: a novel analytical approach for the detection of alternative, non-sequential and recursive splicing
Alternative splicing is a powerful mechanism present in eukaryotic cells to obtain a wide range of transcripts and protein isoforms from a relatively small number of genes. The mechanisms regulating (alternative) splicing and the paradigm of consecutive splicing have recently been challenged, especially for genes with a large number of introns. RNA-Seq, a powerful technology using deep sequencing in order to determine transcript structure and expression levels, is usually performed on mature mRNA, therefore not allowing detailed analysis of splicing progression. Sequencing pre-mRNA at different stages of splicing potentially provides insight into mRNA maturation. Although the number of tools that analyze total and cytoplasmic RNA in order to elucidate the transcriptome composition is rapidly growing, there are no tools specifically designed for the analysis of nuclear RNA (which contains mixtures of pre- and mature mRNA). We developed dedicated algorithms to investigate the splicing process. In this paper, we present a new classification of RNA-Seq reads based on three major stages of splicing: pre-, intermediate- and post-splicing. Applying this novel classification we demonstrate the possibility to analyze the order of splicing. Furthermore, we uncover the potential to investigate the multi-step nature of splicing, assessing various types of recursive splicing events. We provide the data that gives biological insight into the order of splicing, show that non-sequential splicing of certain introns is reproducible and coinciding in multiple cell lines. We validated our observations with independent experimental technologies and showed the reliability of our method. The pipeline, named SplicePie, is freely available at: https://github.com/pulyakhina/splicing_analysis_pipeline. The example data can be found at: https://barmsijs.lumc.nl/HG/irina/example_data.tar.gz.
We present a capture-based approach for bisulfite-converted DNA that allows interrogation of pre-defined genomic locations, allowing quantitative and qualitative assessments of 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) at CG dinucleotides and in non-CG contexts (CHG, CHH) in mammalian and plant genomes. We show the technique works robustly and reproducibly using as little as 500 ng of starting DNA, with results correlating well with whole genome bisulfite sequencing data, and demonstrate that human DNA can be tested in samples contaminated with microbial DNA. This targeting approach will allow cell type-specific designs to maximize the value of 5mC and 5hmC sequencing.
Identifying high-affinity aptamer ligands with defined cross-reactivity using high-throughput guided systematic evolution of ligands by exponential enrichment
Oligonucleotide aptamers represent a novel platform for creating ligands with desired specificity, and they offer many potentially significant advantages over monoclonal antibodies in terms of feasibility, cost, and clinical applicability. However, the isolation of high-affinity aptamer ligands from random oligonucleotide pools has been challenging. Although high-throughput sequencing (HTS) promises to significantly facilitate systematic evolution of ligands by exponential enrichment (SELEX) analysis, the enormous datasets generated in the process pose new challenges for identifying those rare, high-affinity aptamers present in a given pool. We show that emulsion PCR preserves library diversity, preventing the loss of rare high-affinity aptamers that are difficult to amplify. We also demonstrate the importance of using reference targets to eliminate binding candidates with reduced specificity. Using a combination of bioinformatics and functional analyses, we show that the rate of amplification is more predictive than prevalence with respect to binding affinity and that the mutational landscape within a cluster of related aptamers can guide the identification of high-affinity aptamer ligands. Finally, we demonstrate the power of this selection process for identifying cross-species aptamers that can bind human receptors and cross-react with their murine orthologs.
In this report we have analyzed the role of antisense transcription in the control of LEF1 transcription factor expression. A natural antisense transcript (NAT) is transcribed from a promoter present in the first intron of LEF1 gene and undergoes splicing in mesenchymal cells. Although this locus is silent in epithelial cells, and neither NAT transcript nor LEF1 mRNA are expressed, in cell lines with an intermediate epithelial-mesenchymal phenotype presenting low LEF1 expression, the NAT is synthesized and remains unprocessed. Contrarily to the spliced NAT, this unspliced NAT down-regulates the main LEF1 promoter activity and attenuates LEF1 mRNA transcription. Unspliced LEF1 NAT interacts with LEF1 promoter and facilitates PRC2 binding to the LEF1 promoter and trimethylation of lysine 27 in histone 3. Expression of the spliced form of LEF1 NAT in trans prevents the action of unspliced NAT by competing for interaction with the promoter. Thus, these results indicate that LEF1 gene expression is attenuated by an antisense non-coding RNA and that this NAT function is regulated by the balance between its spliced and unspliced forms.
Mutations in the CRE pocket of bacterial RNA polymerase affect multiple steps of transcription
During transcription, the catalytic core of RNA polymerase (RNAP) must interact with the DNA template with low-sequence specificity to ensure efficient enzyme translocation and RNA extension. Unexpectedly, recent structural studies of bacterial promoter complexes revealed specific interactions between the nontemplate DNA strand at the downstream edge of the transcription bubble (CRE, core recognition element) and a protein pocket formed by core RNAP (CRE pocket). We investigated the roles of these interactions in transcription by analyzing point amino acid substitutions and deletions in Escherichia coli RNAP. The mutations affected multiple steps of transcription, including promoter recognition, RNA elongation and termination. In particular, we showed that interactions of the CRE pocket with a nontemplate guanine immediately downstream of the active center stimulate RNA-hairpin-dependent transcription pausing but not other types of pausing. Thus, conformational changes of the elongation complex induced by nascent RNA can modulate CRE effects on transcription. The results highlight the roles of specific core RNAP–DNA interactions at different steps of RNA synthesis and suggest their importance for transcription regulation in various organisms.
Destruction of a distal hypoxia response element abolishes trans-activation of the PAG1 gene mediated by HIF-independent chromatin looping
A crucial step in the cellular adaptation to oxygen deficiency is the binding of hypoxia-inducible factors (HIFs) to hypoxia response elements (HREs) of oxygen-regulated genes. Genome-wide HIF-1α/2α/β DNA-binding studies revealed that the majority of HREs reside distant to the promoter regions, but the function of these distal HREs has only been marginally studied in the genomic context. We used chromatin immunoprecipitation (ChIP), gene editing (TALEN) and chromosome conformation capture (3C) to localize and functionally characterize a 82 kb upstream HRE that solely drives oxygen-regulated expression of the newly identified HIF target gene PAG1. PAG1, a transmembrane adaptor protein involved in Src signalling, was hypoxically induced in various cell lines and mouse tissues. ChIP and reporter gene assays demonstrated that the –82 kb HRE regulates PAG1, but not an equally distant gene further upstream, by direct interaction with HIF. Ablation of the consensus HRE motif abolished the hypoxic induction of PAG1 but not general oxygen signalling. 3C assays revealed that the –82 kb HRE physically associates with the PAG1 promoter region, independent of HIF-DNA interaction. These results demonstrate a constitutive interaction between the –82 kb HRE and the PAG1 promoter, suggesting a physiologically important rapid response to hypoxia.
The ends of eukaryotic chromosomes need to be protected from the activation of a DNA damage response that leads the cell to replicative senescence or apoptosis. In mammals, protection is accomplished by a six-factor complex named shelterin, which organizes the terminal TTAGGG repeats in a still ill-defined structure, the telomere. The stable interaction of shelterin with telomeres mainly depends on the binding of two of its components, TRF1 and TRF2, to double-stranded telomeric repeats. Tethering of TRF proteins to telomeres occurs in a chromatin environment characterized by a very compact nucleosomal organization. In this work we show that binding of TRF1 and TRF2 to telomeric sequences is modulated by the histone octamer. By means of in vitro models, we found that TRF2 binding is strongly hampered by the presence of telomeric nucleosomes, whereas TRF1 binds efficiently to telomeric DNA in a nucleosomal context and is able to remodel telomeric nucleosomal arrays. Our results indicate that the different behavior of TRF proteins partly depends on the interaction with histone tails of their divergent N-terminal domains. We propose that the interplay between the histone octamer and TRF proteins plays a role in the steps leading to telomere deprotection.
Dynamics of MBD2 deposition across methylated DNA regions during malignant transformation of human mammary epithelial cells
DNA methylation is thought to induce transcriptional silencing through the combination of two mechanisms: the repulsion of transcriptional activators unable to bind their target sites when methylated, and the recruitment of transcriptional repressors with specific affinity for methylated DNA. The Methyl CpG Binding Domain proteins MeCP2, MBD1 and MBD2 belong to the latter category. Here, we present MBD2 ChIPseq data obtained from the endogenous MBD2 in an isogenic cellular model of oncogenic transformation of human mammary cells. In immortalized (HMEC-hTERT) or transformed (HMLER) cells, MBD2 was found in a large proportion of methylated regions and associated with transcriptional silencing. A redistribution of MBD2 on methylated DNA occurred during oncogenic transformation, frequently independently of local DNA methylation changes. Genes downregulated during HMEC-hTERT transformation preferentially gained MBD2 on their promoter. Furthermore, depletion of MBD2 induced an upregulation of MBD2-bound genes methylated at their promoter regions, in HMLER cells. Among the 3,160 genes downregulated in transformed cells, 380 genes were methylated at their promoter regions in both cell lines, specifically associated by MBD2 in HMLER cells, and upregulated upon MBD2 depletion in HMLER. The transcriptional MBD2-dependent downregulation occurring during oncogenic transformation was also observed in two additional models of mammary cell transformation. Thus, the dynamics of MBD2 deposition across methylated DNA regions was associated with the oncogenic transformation of human mammary cells.
Novel mechanism of gene regulation: the protein Rv1222 of Mycobacterium tuberculosis inhibits transcription by anchoring the RNA polymerase onto DNA
We propose a novel mechanism of gene regulation in Mycobacterium tuberculosis where the protein Rv1222 inhibits transcription by anchoring RNA polymerase (RNAP) onto DNA. In contrast to our existing knowledge that transcriptional repressors function either by binding to DNA at specific sequences or by binding to RNAP, we show that Rv1222-mediated transcription inhibition requires simultaneous binding of the protein to both RNAP and DNA. We demonstrate that the positively charged C-terminus tail of Rv1222 is responsible for anchoring RNAP on DNA, hence the protein slows down the movement of RNAP along the DNA during transcription elongation. The interaction between Rv1222 and DNA is electrostatic, thus the protein could inhibit transcription from any gene. As Rv1222 slows down the RNA synthesis, upon expression of the protein in Mycobacterium smegmatis or Escherichia coli, the growth rate of the bacteria is severely impaired. The protein does not possess any significant affinity for DNA polymerase, thus, is unable to inhibit DNA synthesis. The proposed mechanism by which Rv1222 inhibits transcription reveals a new repertoire of prokaryotic gene regulation.
Gene target specificity of the Super Elongation Complex (SEC) family: how HIV-1 Tat employs selected SEC members to activate viral transcription
The AF4/FMR2 proteins AFF1 and AFF4 act as a scaffold to assemble the Super Elongation Complex (SEC) that strongly activates transcriptional elongation of HIV-1 and cellular genes. Although they can dimerize, it is unclear whether the dimers exist and function within a SEC in vivo. Furthermore, it is unknown whether AFF1 and AFF4 function similarly in mediating SEC-dependent activation of diverse genes. Providing answers to these questions, our current study shows that AFF1 and AFF4 reside in separate SECs that display largely distinct gene target specificities. While the AFF1-SEC is more potent in supporting HIV-1 transactivation by the viral Tat protein, the AFF4-SEC is more important for HSP70 induction upon heat shock. The functional difference between AFF1 and AFF4 in Tat-transactivation has been traced to a single amino acid variation between the two proteins, which causes them to enhance the affinity of Tat for P-TEFb, a key SEC component, with different efficiency. Finally, genome-wide analysis confirms that the genes regulated by AFF1-SEC and AFF4-SEC are largely non-overlapping and perform distinct functions. Thus, the SEC represents a family of related complexes that exist to increase the regulatory diversity and gene control options during transactivation of diverse cellular and viral genes.
Targeting chromatin binding regulation of constitutively active AR variants to overcome prostate cancer resistance to endocrine-based therapies
Androgen receptor (AR) variants (AR-Vs) expressed in prostate cancer (PCa) lack the AR ligand binding domain (LBD) and function as constitutively active transcription factors. AR-V expression in patient tissues or circulating tumor cells is associated with resistance to AR-targeting endocrine therapies and poor outcomes. Here, we investigated the mechanisms governing chromatin binding of AR-Vs with the goal of identifying therapeutic vulnerabilities. By chromatin immunoprecipitation and sequencing (ChIP-seq) and complementary biochemical experiments, we show that AR-Vs display a binding preference for the same canonical high-affinity androgen response elements (AREs) that are preferentially engaged by AR, albeit with lower affinity. Dimerization was an absolute requirement for constitutive AR-V DNA binding and transcriptional activation. Treatment with the bromodomain and extraterminal (BET) inhibitor JQ1 resulted in inhibition of AR-V chromatin binding and impaired AR-V driven PCa cell growth in vitro and in vivo. Importantly, this was associated with a novel JQ1 action of down-regulating AR-V transcript and protein expression. Overall, this study demonstrates that AR-Vs broadly restore AR chromatin binding events that are otherwise suppressed during endocrine therapy, and provides pre-clinical rationale for BET inhibition as a strategy for inhibiting expression and chromatin binding of AR-Vs in PCa.
Cdt1-binding protein GRWD1 is a novel histone-binding protein that facilitates MCM loading through its influence on chromatin architecture
Efficient pre-replication complex (pre-RC) formation on chromatin templates is crucial for the maintenance of genome integrity. However, the regulation of chromatin dynamics during this process has remained elusive. We found that a conserved protein, GRWD1 (glutamate-rich WD40 repeat containing 1), binds to two representative replication origins specifically during G1 phase in a CDC6- and Cdt1-dependent manner, and that depletion of GRWD1 reduces loading of MCM but not CDC6 and Cdt1. Furthermore, chromatin immunoprecipitation coupled with high-throughput sequencing (Seq) revealed significant genome-wide co-localization of GRWD1 with CDC6. We found that GRWD1 has histone-binding activity. To investigate the effect of GRWD1 on chromatin architecture, we used formaldehyde-assisted isolation of regulatory elements (FAIRE)-seq or FAIRE-quantitative PCR analyses, and the results suggest that GRWD1 regulates chromatin openness at specific chromatin locations. Taken together, these findings suggest that GRWD1 may be a novel histone-binding protein that regulates chromatin dynamics and MCM loading at replication origins.
SLX4 contributes to telomere preservation and regulated processing of telomeric joint molecule intermediates
SLX4 assembles a toolkit of endonucleases SLX1, MUS81 and XPF, which is recruited to telomeres via direct interaction of SLX4 with TRF2. Telomeres present an inherent obstacle for DNA replication and repair due to their high propensity to form branched DNA intermediates. Here we provide novel insight into the mechanism and regulation of the SLX4 complex in telomere preservation. SLX4 associates with telomeres throughout the cell cycle, peaking in late S phase and under genotoxic stress. Disruption of SLX4's interaction with TRF2 or SLX1 and SLX1's nuclease activity independently causes telomere fragility, suggesting a requirement of the SLX4 complex for nucleolytic resolution of branched intermediates during telomere replication. Indeed, the SLX1–SLX4 complex processes a variety of telomeric joint molecules in vitro. The nucleolytic activity of SLX1-SLX4 is negatively regulated by telomeric DNA-binding proteins TRF1 and TRF2 and is suppressed by the RecQ helicase BLM in vitro. In vivo, in the presence of functional BLM, telomeric circle formation and telomere sister chromatid exchange, both arising out of nucleolytic processing of telomeric homologous recombination intermediates, are suppressed. We propose that the SLX4-toolkit is a telomere accessory complex that, in conjunction with other telomere maintenance proteins, ensures unhindered, but regulated telomere maintenance.
Two mechanisms coordinate replication termination by the Escherichia coli Tus-Ter complex
The Escherichia coli replication terminator protein (Tus) binds to Ter sequences to block replication forks approaching from one direction. Here, we used single molecule and transient state kinetics to study responses of the heterologous phage T7 replisome to the Tus–Ter complex. The T7 replisome was arrested at the non-permissive end of Tus–Ter in a manner that is explained by a composite mousetrap and dynamic clamp model. An unpaired C(6) that forms a lock by binding into the cytosine binding pocket of Tus was most effective in arresting the replisome and mutation of C(6) removed the barrier. Isolated helicase was also blocked at the non-permissive end, but unexpectedly the isolated polymerase was not, unless C(6) was unpaired. Instead, the polymerase was blocked at the permissive end. This indicates that the Tus–Ter mechanism is sensitive to the translocation polarity of the DNA motor. The polymerase tracking along the template strand traps the C(6) to prevent lock formation; the helicase tracking along the other strand traps the complementary G(6) to aid lock formation. Our results are consistent with the model where strand separation by the helicase unpairs the GC(6) base pair and triggers lock formation immediately before the polymerase can sequester the C(6) base.
Modulation of LSD1 phosphorylation by CK2/WIP1 regulates RNF168-dependent 53BP1 recruitment in response to DNA damage
Proper DNA damage response is essential for the maintenance of genome integrity. The E3 ligase RNF168 deficiency fully prevents both the initial recruitment and retention of 53BP1 at sites of DNA damage. In response to DNA damage, RNF168-dependent recruitment of the lysine-specific demethylase LSD1 to the site of DNA damage promotes local H3K4me2 demethylation and ubiquitination of H2A/H2AX, facilitating 53BP1 recruitment to sites of DNA damage. Alternatively, RNF168-mediated K63-linked ubiquitylation of 53BP1 is required for the initial recruitment of 53BP1 to sites of DNA damage and for its function in repair. We demonstrated here that phosphorylation and dephosphorylation of LSD1 at S131 and S137 was mediated by casein kinase 2 (CK2) and wild-type p53-induced phosphatase 1 (WIP1), respectively. LSD1, RNF168 and 53BP1 interacted with each other directly. CK2-mediated phosphorylation of LSD1 exhibited no impact on its interaction with 53BP1, but promoted its interaction with RNF168 and RNF168-dependent 53BP1 ubiquitination and subsequent recruitment to the DNA damage sites. Furthermore, overexpression of phosphorylation-defective mutants failed to restore LSD1 depletion-induced cellular sensitivity to DNA damage. Taken together, our results suggest that LSD1 phosphorylation modulated by CK2/WIP1 regulates RNF168-dependent 53BP1 recruitment directly in response to DNA damage and cellular sensitivity to DNA damaging agents.
AP endonuclease 1 prevents trinucleotide repeat expansion via a novel mechanism during base excision repair
Base excision repair (BER) of an oxidized base within a trinucleotide repeat (TNR) tract can lead to TNR expansions that are associated with over 40 human neurodegenerative diseases. This occurs as a result of DNA secondary structures such as hairpins formed during repair. We have previously shown that BER in a TNR hairpin loop can lead to removal of the hairpin, attenuating or preventing TNR expansions. Here, we further provide the first evidence that AP endonuclease 1 (APE1) prevented TNR expansions via its 3'-5' exonuclease activity and stimulatory effect on DNA ligation during BER in a hairpin loop. Coordinating with flap endonuclease 1, the APE1 3'-5' exonuclease activity cleaves the annealed upstream 3'-flap of a double-flap intermediate resulting from 5'-incision of an abasic site in the hairpin loop. Furthermore, APE1 stimulated DNA ligase I to resolve a long double-flap intermediate, thereby promoting hairpin removal and preventing TNR expansions.