Bioinformatics Journal

Bioinformatics - RSS feed of current issue
  • Exploiting hidden information interleaved in the redundancy of the genetic code without prior knowledge
    [Apr 2015]

    Motivation: Dozens of studies in recent years have demonstrated that codon usage encodes various aspects related to all stages of gene expression regulation. When relevant high-quality large-scale gene expression data are available, it is possible to statistically infer and model these signals, enabling analysing and engineering gene expression. However, when these data are not available, it is impossible to infer and validate such models.

    Results: In this current study, we suggest Chimera—an unsupervised computationally efficient approach for exploiting hidden high-dimensional information related to the way gene expression is encoded in the open reading frame (ORF), based solely on the genome of the analysed organism. One version of the approach, named Chimera Average Repetitive Substring (ChimeraARS), estimates the adaptability of an ORF to the intracellular gene expression machinery of a genome (host), by computing its tendency to include long substrings that appear in its coding sequences; the second version, named ChimeraMap, engineers the codons of a protein such that it will include long substrings of codons that appear in the host coding sequences, improving its adaptation to a new host’s gene expression machinery. We demonstrate the applicability of the new approach for analysing and engineering heterologous genes and for analysing endogenous genes. Specifically, focusing on Escherichia coli, we show that it can exploit information that cannot be detected by conventional approaches (e.g. the CAI—Codon Adaptation Index), which only consider single codon distributions; for example, we report correlations of up to 0.67 for the ChimeraARS measure with heterologous gene expression, when the CAI yielded no correlation.

    Availability and implementation: For non-commercial purposes, the code of the Chimera approach can be downloaded from http://www.cs.tau.ac.il/~tamirtul/Chimera/download.htm.

    Contact: tamirtul@post.tau.ac.il

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • andi: Fast and accurate estimation of evolutionary distances between closely related genomes
    [Apr 2015]

    Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes.

    Results: Our distance measure is based on ungapped local alignments that we anchor through pairs of maximal unique matches of a minimum length. These exact matches can be looked up efficiently using enhanced suffix arrays and our implementation requires approximately only 1 s and 45 MB RAM/Mbase analysed. The pairing of matches distinguishes non-homologous from homologous regions leading to accurate distance estimation. We show this by analysing simulated data and genome samples ranging from 29 Escherichia coli/Shigella genomes to 3085 genomes of Streptococcus pneumoniae.

    Availability and implementation: We have implemented the computation of anchor distances in the multithreaded UNIX command-line program andi for ANchor DIstances. C sources and documentation are posted at http://github.com/evolbioinf/andi/

    Contact: haubold@evolbio.mpg.de

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • JEPEG: a summary statistics based tool for gene-level joint testing of functional variants
    [Apr 2015]

    Motivation: Gene expression is influenced by variants commonly known as expression quantitative trait loci (eQTL). On the basis of this fact, researchers proposed to use eQTL/functional information univariately for prioritizing single nucleotide polymorphisms (SNPs) signals from genome-wide association studies (GWAS). However, most genes are influenced by multiple eQTLs which, thus, jointly affect any downstream phenotype. Therefore, when compared with the univariate prioritization approach, a joint modeling of eQTL action on phenotypes has the potential to substantially increase signal detection power. Nonetheless, a joint eQTL analysis is impeded by (i) not measuring all eQTLs in a gene and/or (ii) lack of access to individual genotypes.

    Results: We propose joint effect on phenotype of eQTL/functional SNPs associated with a gene (JEPEG), a novel software tool which uses only GWAS summary statistics to (i) impute the summary statistics at unmeasured eQTLs and (ii) test for the joint effect of all measured and imputed eQTLs in a gene. We illustrate the behavior/performance of the developed tool by analysing the GWAS meta-analysis summary statistics from the Psychiatric Genomics Consortium Stage 1 and the Genetic Consortium for Anorexia Nervosa.

    Conclusions: Applied analyses results suggest that JEPEG complements commonly used univariate GWAS tools by: (i) increasing signal detection power via uncovering (a) novel genes or (b) known associated genes in smaller cohorts and (ii) assisting in fine-mapping of challenging regions, e.g. major histocompatibility complex for schizophrenia.

    Availability and implementation: JEPEG, its associated database of eQTL SNPs and usage examples are publicly available at http://code.google.com/p/jepeg/.

    Contact: dlee4@vcu.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Faster sequence homology searches by clustering subsequences
    [Apr 2015]

    Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis.

    Results: We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ~2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ~2.2–2.8 times faster than RAPSearch and is ~185–261 times faster than BLASTX.

    Availability and implementation: The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/

    Contact: akiyama@cs.titech.ac.jp

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • SNPlice: variants that modulate Intron retention from RNA-sequencing data
    [Apr 2015]

    Rationale: The growing recognition of the importance of splicing, together with rapidly accumulating RNA-sequencing data, demand robust high-throughput approaches, which efficiently analyze experimentally derived whole-transcriptome splice profiles.

    Results: We have developed a computational approach, called SNPlice, for identifying cis-acting, splice-modulating variants from RNA-seq datasets. SNPlice mines RNA-seq datasets to find reads that span single-nucleotide variant (SNV) loci and nearby splice junctions, assessing the co-occurrence of variants and molecules that remain unspliced at nearby exon–intron boundaries. Hence, SNPlice highlights variants preferentially occurring on intron-containing molecules, possibly resulting from altered splicing. To illustrate co-occurrence of variant nucleotide and exon–intron boundary, allele-specific sequencing was used. SNPlice results are generally consistent with splice-prediction tools, but also indicate splice-modulating elements missed by other algorithms. SNPlice can be applied to identify variants that correlate with unexpected splicing events, and to measure the splice-modulating potential of canonical splice-site SNVs.

    Availability and implementation: SNPlice is freely available for download from https://code.google.com/p/snplice/ as a self-contained binary package for 64-bit Linux computers and as python source-code.

    Contact: pmudvari@gwu.edu or horvatha@gwu.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Semiconductor sequencing: how many flows do you need?
    [Apr 2015]

    Motivation: Semiconductor sequencing directly translates chemically encoded information (A, C, G or T) into voltage signals that are detected by a semiconductor device. Changes of pH value and thereby of the electric potential in the reaction well are detected during strand synthesis from nucleotides provided in cyclic repeated flows for each type of nucleotide. To minimize time requirement and costs, it is necessary to know the number of flows that are required for complete coverage of the templates.

    Results: We calculate the number of required flows in a random sequence model and present exact expressions for cumulative distribution function, expected value and variance. Additionally, we provide an algorithm to calculate the number of required flows for a concrete list of amplicons using a BED file of genomic positions as input. We apply the algorithm to calculate the number of flows that are required to cover six amplicon panels that are used for targeted sequencing in cancer research. The upper bounds obtained for the number of flows allow to enhance the instrument throughput from two chips to three chips per day for four of these panels.

    Availability and implementation: The algorithm for calculation of the flows was implemented in R and is available as package ionflows from the CRAN repository.

    Contact: jan.budczies@charite.de

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • State of the art prediction of HIV-1 protease cleavage sites
    [Apr 2015]

    Motivation: Understanding the substrate specificity of human immunodeficiency virus (HIV)-1 protease is important when designing effective HIV-1 protease inhibitors. Furthermore, characterizing and predicting the cleavage profile of HIV-1 protease is essential to generate and test hypotheses of how HIV-1 affects proteins of the human host. Currently available tools for predicting cleavage by HIV-1 protease can be improved.

    Results: The linear support vector machine with orthogonal encoding is shown to be the best predictor for HIV-1 protease cleavage. It is considerably better than current publicly available predictor services. It is also found that schemes using physicochemical properties do not improve over the standard orthogonal encoding scheme. Some issues with the currently available data are discussed.

    Availability and implementation: The datasets used, which are the most important part, are available at the UCI Machine Learning Repository. The tools used are all standard and easily available.

    Contact: thorsteinn.rognvaldsson@hh.se

    Categories: Journal Articles
  • Improved gene tree error correction in the presence of horizontal gene transfer
    [Apr 2015]

    Motivation: The accurate inference of gene trees is a necessary step in many evolutionary studies. Although the problem of accurate gene tree inference has received considerable attention, most existing methods are only applicable to gene families unaffected by horizontal gene transfer. As a result, the accurate inference of gene trees affected by horizontal gene transfer remains a largely unaddressed problem.

    Results: In this study, we introduce a new and highly effective method for gene tree error correction in the presence of horizontal gene transfer. Our method efficiently models horizontal gene transfers, gene duplications and losses, and uses a statistical hypothesis testing framework [Shimodaira–Hasegawa (SH) test] to balance sequence likelihood with topological information from a known species tree. Using a thorough simulation study, we show that existing phylogenetic methods yield inaccurate gene trees when applied to horizontally transferred gene families and that our method dramatically improves gene tree accuracy. We apply our method to a dataset of 11 cyanobacterial species and demonstrate the large impact of gene tree accuracy on downstream evolutionary analyses.

    Availability and implementation: An implementation of our method is available at http://compbio.mit.edu/treefix-dtl/

    Contact: mukul@engr.uconn.edu or manoli@mit.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Clustering-based model of cysteine co-evolution improves disulfide bond connectivity prediction and reduces homologous sequence requirements
    [Apr 2015]

    Motivation: Cysteine residues have particular structural and functional relevance in proteins because of their ability to form covalent disulfide bonds. Bioinformatics tools that can accurately predict cysteine bonding states are already available, whereas it remains challenging to infer the disulfide connectivity pattern of unknown protein sequences. Improving accuracy in this area is highly relevant for the structural and functional annotation of proteins.

    Results: We predict the intra-chain disulfide bond connectivity patterns starting from known cysteine bonding states with an evolutionary-based unsupervised approach called Sephiroth that relies on high-quality alignments obtained with HHblits and is based on a coarse-grained cluster-based modelization of tandem cysteine mutations within a protein family. We compared our method with state-of-the-art unsupervised predictors and achieve a performance improvement of 25–27% while requiring an order of magnitude less of aligned homologous sequences (~103 instead of ~104).

    Availability and implementation: The software described in this article and the datasets used are available at http://ibsquare.be/sephiroth.

    Contact: wvranken@vub.ac.be

    Supplementary information: Supplementary material is available at Bioinformatics online.

    Categories: Journal Articles
  • Identifying cancer-related microRNAs based on gene expression data
    [Apr 2015]

    Motivation: MicroRNAs (miRNAs) are short non-coding RNAs that play important roles in post-transcriptional regulations as well as other important biological processes. Recently, accumulating evidences indicate that miRNAs are extensively involved in cancer. However, it is a big challenge to identify which miRNAs are related to which cancer considering the complex processes involved in tumors, where one miRNA may target hundreds or even thousands of genes and one gene may regulate multiple miRNAs. Despite integrative analysis of matched gene and miRNA expression data can help identify cancer-associated miRNAs, such kind of data is not commonly available. On the other hand, there are huge amount of gene expression data that are publicly accessible. It will significantly improve the efficiency of characterizing miRNA’s function in cancer if we can identify cancer miRNAs directly from gene expression data.

    Results: We present a novel computational framework to identify the cancer-related miRNAs based solely on gene expression profiles without requiring either miRNA expression data or the matched gene and miRNA expression data. The results on multiple cancer datasets show that our proposed method can effectively identify cancer-related miRNAs with higher precision compared with other popular approaches. Furthermore, some of our novel predictions are validated by both differentially expressed miRNAs and evidences from literature, implying the predictive power of our proposed method. In addition, we construct a cancer-miRNA-pathway network, which can help explain how miRNAs are involved in cancer.

    Availability and implementation: The R code and data files for the proposed method are available at http://comp-sysbio.org/miR_Path/

    Contact: liukeq@gmail.com

    Supplementary information: supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • QuASAR: quantitative allele-specific analysis of reads
    [Apr 2015]

    Motivation: Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls.

    Results: We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available.

    Availability and implementation: http://github.com/piquelab/QuASAR.

    Contact: fluca@wayne.edu or rpique@wayne.edu

    Supplementary information: Supplementary Material is available at Bioinformatics online.

    Categories: Journal Articles
  • Graphical algorithm for integration of genetic and biological data: proof of principle using psoriasis as a model
    [Apr 2015]

    Motivation: Pathway analysis to reveal biological mechanisms for results from genetic association studies have great potential to better understand complex traits with major human disease impact. However, current approaches have not been optimized to maximize statistical power to identify enriched functions/pathways, especially when the genetic data derives from studies using platforms (e.g. Immunochip and Metabochip) customized to have pre-selected markers from previously identified top-rank loci. We present here a novel approach, called Minimum distance-based Enrichment Analysis for Genetic Association (MEAGA), with the potential to address both of these important concerns.

    Results: MEAGA performs enrichment analysis using graphical algorithms to identify sub-graphs among genes and measure their closeness in interaction database. It also incorporates a statistic summarizing the numbers and total distances of the sub-graphs, depicting the overlap between observed genetic signals and defined function/pathway gene-sets. MEAGA uses sampling technique to approximate empirical and multiple testing-corrected P-values. We show in simulation studies that MEAGA is more powerful compared to count-based strategies in identifying disease-associated functions/pathways, and the increase in power is influenced by the shortest distances among associated genes in the interactome. We applied MEAGA to the results of a meta-analysis of psoriasis using Immunochip datasets, and showed that associated genes are significantly enriched in immune-related functions and closer with each other in the protein–protein interaction network.

    Availability and implementation: http://genome.sph.umich.edu/wiki/MEAGA

    Contact: tsoi.teen@gmail.com or goncalo@umich.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • SMARTS: reconstructing disease response networks from multiple individuals using time series gene expression data
    [Apr 2015]

    Motivation: Current methods for reconstructing dynamic regulatory networks are focused on modeling a single response network using model organisms or cell lines. Unlike these models or cell lines, humans differ in their background expression profiles due to age, genetics and life factors. In addition, there are often differences in start and end times for time series human data and in the rate of progress based on the specific individual. Thus, new methods are required to integrate time series data from multiple individuals when modeling and constructing disease response networks.

    Results: We developed Scalable Models for the Analysis of Regulation from Time Series (SMARTS), a method integrating static and time series data from multiple individuals to reconstruct condition-specific response networks in an unsupervised way. Using probabilistic graphical models, SMARTS iterates between reconstructing different regulatory networks and assigning individuals to these networks, taking into account varying individual start times and response rates. These models can be used to group different sets of patients and to identify transcription factors that differentiate the observed responses between these groups. We applied SMARTS to analyze human response to influenza and mouse brain development. In both cases, it was able to greatly improve baseline groupings while identifying key relevant TFs that differ between the groups. Several of these groupings and TFs are known to regulate the relevant processes while others represent novel hypotheses regarding immune response and development.

    Availability and implementation: Software and Supplementary information are available at http://sb.cs.cmu.edu/smarts/.

    Contact: zivbj@cs.cmu.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Computer-assisted curation of a human regulatory core network from the biological literature
    [Apr 2015]

    Motivation: A highly interlinked network of transcription factors (TFs) orchestrates the context-dependent expression of human genes. ChIP-chip experiments that interrogate the binding of particular TFs to genomic regions are used to reconstruct gene regulatory networks at genome-scale, but are plagued by high false-positive rates. Meanwhile, a large body of knowledge on high-quality regulatory interactions remains largely unexplored, as it is available only in natural language descriptions scattered over millions of scientific publications. Such data are hard to extract and regulatory data currently contain together only 503 regulatory relations between human TFs.

    Results: We developed a text-mining-assisted workflow to systematically extract knowledge about regulatory interactions between human TFs from the biological literature. We applied this workflow to the entire Medline, which helped us to identify more than 45 000 sentences potentially describing such relationships. We ranked these sentences by a machine-learning approach. The top-2500 sentences contained ~900 sentences that encompass relations already known in databases. By manually curating the remaining 1625 top-ranking sentences, we obtained more than 300 validated regulatory relationships that were not present in a regulatory database before. Full-text curation allowed us to obtain detailed information on the strength of experimental evidences supporting a relationship.

    Conclusions: We were able to increase curated information about the human core transcriptional network by >60% compared with the current content of regulatory databases. We observed improved performance when using the network for disease gene prioritization compared with the state-of-the-art.

    Availability and implementation: Web-service is freely accessible at http://fastforward.sys-bio.net/.

    Contact: leser@informatik.hu-berlin.de or nils.bluethgen@charite.de

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Retrieval of Enterobacteriaceae drug targets using singular value decomposition
    [Apr 2015]

    Motivation: The identification of potential drug target proteins in bacteria is important in pharmaceutical research for the development of new antibiotics to combat bacterial agents that cause diseases.

    Results: A new model that combines the singular value decomposition (SVD) technique with biological filters composed of a set of protein properties associated with bacterial drug targets and similarity to protein-coding essential genes of Escherichia coli (strain K12) has been created to predict potential antibiotic drug targets in the Enterobacteriaceae family. This model identified 99 potential drug target proteins in the studied family, which exhibit eight different functions and are protein-coding essential genes or similar to protein-coding essential genes of E.coli (strain K12), indicating that the disruption of the activities of these proteins is critical for cells. Proteins from bacteria with described drug resistance were found among the retrieved candidates. These candidates have no similarity to the human proteome, therefore exhibiting the advantage of causing no adverse effects or at least no known adverse effects on humans.

    Contact: rita_silverio@hotmail.com.

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank
    [Apr 2015]

    Summary: The Chemical Component Dictionary (CCD) is a chemical reference data resource that describes all residue and small molecule components found in Protein Data Bank (PDB) entries. The CCD contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands and solvent molecules. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, chemical descriptors, systematic chemical names and idealized coordinates. The content, preparation, validation and distribution of this CCD chemical reference dataset are described.

    Availability and implementation: The CCD is updated regularly in conjunction with the scheduled weekly release of new PDB structure data. The CCD and amino acid variant reference datasets are hosted in the public PDB ftp repository at ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif.gz, ftp://ftp.wwpdb.org/pub/pdb/data/monomers/aa-variants-v1.cif.gz, and its mirror sites, and can be accessed from http://wwpdb.org.

    Contact: jwest@rcsb.rutgers.edu.

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Adaptive settings for the nearest-neighbor particle tracking algorithm
    [Apr 2015]

    Background: The performance of the single particle tracking (SPT) nearest-neighbor algorithm is determined by parameters that need to be set according to the characteristics of the time series under study. Inhomogeneous systems, where these characteristics fluctuate spatially, are poorly tracked when parameters are set globally.

    Results: We present a novel SPT approach that adapts the well-known nearest-neighbor tracking algorithm to the local density of particles to overcome the problems of inhomogeneity.

    Conclusions: We demonstrate the performance improvement provided by the proposed method using numerical simulations and experimental data and compare its performance with state of the art SPT algorithms.

    Availability and implementation: The algorithms proposed here, are released under the GNU General Public License and are freely available on the web at http://sourceforge.net/p/adaptivespt.

    Contact: javier.mazzaferri@gmail.com

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Population-based structural variation discovery with Hydra-Multi
    [Apr 2015]

    Summary: Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extant tools suffer from poor scalability, which limits the number of genomes that can be co-analyzed and complicates analysis workflows. We have developed an approach that enables multisample SV analysis in hundreds to thousands of human genomes using commodity hardware. Here, we describe Hydra-Multi and measure its accuracy, speed and scalability using publicly available datasets provided by The 1000 Genomes Project and by The Cancer Genome Atlas (TCGA).

    Availability and implementation: Hydra-Multi is written in C++ and is freely available at https://github.com/arq5x/Hydra.

    Contact: aaronquinlan@gmail.com or ihall@genome.wustl.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements
    [Apr 2015]

    Summary: We implemented a high-throughput identification pipeline for promoter interacting enhancer element to streamline the workflow from mapping raw Hi-C reads, identifying DNA–DNA interacting fragments with high confidence and quality control, detecting histone modifications and DNase hypersensitive enrichments in putative enhancer elements, to ultimately extracting possible intra- and inter-chromosomal enhancer–target gene relationships.

    Availability and implementation: This software package is designed to run on high-performance computing clusters with Oracle Grid Engine. The source code is freely available under the MIT license for academic and nonprofit use. The source code and instructions are available at the Wang lab website (http://wanglab.pcbi.upenn.edu/hippie/). It is also provided as an Amazon Machine Image to be used directly on Amazon Cloud with minimal installation.

    Contact: lswang@mail.med.upenn.edu or bdgregor@sas.upenn.edu

    Supplementary information: Supplementary Material is available at Bioinformatics online.

    Categories: Journal Articles
  • GenomeCons: a web server for manipulating multiple genome sequence alignments and their consensus sequences
    [Apr 2015]

    Summary: Genome sequence alignments provide valuable information on many aspects of molecular biological processes. In this study, we developed a web server, GenomeCons, for manipulating multiple genome sequence alignments and their consensus sequences for high-throughput genome sequence analyses. This server facilitates the visual inspection of multiple genome sequence alignments for a set of genomic intervals at a time. This allows the user to examine how these sites are evolutionarily conserved over time for their functional importance. The server also reports consensus sequences for the input genomic intervals, which can be applied to downstream analyses such as the identification of common motifs in the regions determined by ChIP-seq experiments.

    Availability and implementation: GenomeCons is freely accessible at http://bioinfo.sls.kyushu-u.ac.jp/genomecons/

    Contact: mikita@bioreg.kyushu-u.ac.jp

    Categories: Journal Articles