The latest research articles published by BMC Bioinformatics
Background: Interactions that involve one or more amino acid side chains near the ends of protein helices stabilize helix termini and shape the geometry of the adjacent loops, making a substantial contribution to overall protein structure. Previous work has identified key helix-terminal motifs, such as Asx/ST N-caps, the capping box, and hydrophobic and electrostatic interactions, but important questions remain, including: 1) What loop backbone geometries are favoured by each motif? 2) To what extent are multi-amino acid motifs likely to represent genuine cooperative interactions? 3) Can new motifs be identified in a large, recent dataset using the latest bioinformatics tools? Results: Three analytical tools are applied here to answer these questions. First, helix-terminal structures are partitioned by loop backbone geometry using a new 3D clustering algorithm. Next, Cascade Detection, a motif detection algorithm recently published by the author, is applied to each cluster to determine which sequence motifs are overrepresented in each geometry. Finally, the results for each motif are presented in a CapMap, a 3D conformational heatmap that displays the distribution of the motif’s overrepresentation across loop geometries, enabling the rapid isolation and characterization of the associated side chain interaction. This work identifies a library of geometry-specific side chain interactions that provides a new, detailed picture of loop structure near the helix terminus. Highlights include determinations of the favoured loop geometries for the Asx/ST N-cap motifs, capping boxes, “big” boxes, and other hydrophobic, electrostatic, H-bond, and pi stacking interactions, many of which have not been described before. Conclusions: This work demonstrates that the combination of structural clustering and motif detection in the sequence space can efficiently identify side chain motifs and map them to the loop geometries which they support. Protein designers should find this study useful, because it identifies side chain interactions which are good candidates for inclusion in synthetic helix-terminal loops with specific desired geometries, since they are used in nature to support these geometries. The techniques described here can also be applied to map side chain interactions associated with other structural components of proteins such as beta and gamma turns.
Background: While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes. Results: Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8 % and 13.8 % of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively – a feat that is not possible with other leading tools with the breadth of data used in our study. Conclusion: Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release.
Object-based representation and analysis of light and electron microscopic volume data using Blender
Background: Rapid improvements in light and electron microscopy imaging techniques and the development of 3D anatomical atlases necessitate new approaches for the visualization and analysis of image data. Pixel-based representations of raw light microscopy data suffer from limitations in the number of channels that can be visualized simultaneously. Complex electron microscopic reconstructions from large tissue volumes are also challenging to visualize and analyze. Results: Here we exploit the advanced visualization capabilities and flexibility of the open-source platform Blender to visualize and analyze anatomical atlases. We use light-microscopy-based gene expression atlases and electron microscopy connectome volume data from larval stages of the marine annelid Platynereis dumerilii. We build object-based larval gene expression atlases in Blender and develop tools for annotation and coexpression analysis. We also represent and analyze connectome data including neuronal reconstructions and underlying synaptic connectivity. Conclusions: We demonstrate the power and flexibility of Blender for visualizing and exploring complex anatomical atlases. The resources we have developed for Platynereis will facilitate data sharing and the standardization of anatomical atlases for this species. The flexibility of Blender, particularly its embedded Python application programming interface, means that our methods can be easily extended to other organisms.
Background: Non-synonymous single nucleotide polymorphisms (nsSNPs) are the most common DNA sequence variation associated with disease in humans. Thus determining the clinical significance of each nsSNP is of great importance. Potential detrimental nsSNPs may be identified by genetic association studies or by functional analysis in the laboratory, both of which are expensive and time consuming. Existing computational methods lack accuracy and features to facilitate nsSNP classification for clinical use. We developed the GESPA (GEnomic Single nucleotide Polymorphism Analyzer) program to predict the pathogenicity and disease phenotype of nsSNPs. Results: GESPA is a user-friendly software package for classifying disease association of nsSNPs. It allows flexibility in acceptable input formats and predicts the pathogenicity of a given nsSNP by assessing the conservation of amino acids in orthologs and paralogs and supplementing this information with data from medical literature. The development and testing of GESPA was performed using the humsavar, ClinVar and humvar datasets. Additionally, GESPA also predicts the disease phenotype associated with a nsSNP with high accuracy, a feature unavailable in existing software. GESPA’s overall accuracy exceeds existing computational methods for predicting nsSNP pathogenicity. The usability of GESPA is enhanced by fast SQL-based cloud storage and retrieval of data. Conclusions: GESPA is a novel bioinformatics tool to determine the pathogenicity and phenotypes of nsSNPs. We anticipate that GESPA will become a useful clinical framework for predicting the disease association of nsSNPs. The program, executable jar file, source code, GPL 3.0 license, user guide, and test data with instructions are available at http://sourceforge.net/projects/gespa.
Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale
Background: With rapid advancements in technology, the sequences of thousands of species’ genomes are becoming available. Within the sequences are repeats that comprise significant portions of genomes. Successful annotations thus require accurate discovery of repeats. As species-specific elements, repeats in newly sequenced genomes are likely to be unknown. Therefore, annotating newly sequenced genomes requires tools to discover repeats de-novo. However, the currently available de-novo tools have limitations concerning the size of the input sequence, ease of use, sensitivities to major types of repeats, consistency of performance, speed, and false positive rate. Results: To address these limitations, I designed and developed Red, applying Machine Learning. Red is the first repeat-detection tool capable of labeling its training data and training itself automatically on an entire genome. Red is easy to install and use. It is sensitive to both transposons and simple repeats; in contrast, available tools such as RepeatScout and ReCon are sensitive to transposons, and WindowMasker to simple repeats. Red performed consistently well on seven genomes; the other tools performed well only on some genomes. Red is much faster than RepeatScout and ReCon and has a much lower false positive rate than WindowMasker. On human genes with five or more copies, Red was more specific than RepeatScout by a wide margin. When tested on genomes of unusual nucleotide compositions, Red located repeats with high sensitivities and maintained moderate false positive rates. Red outperformed the related tools on a bacterial genome. Red identified 46,405 novel repetitive segments in the human genome. Finally, Red is capable of processing assembled and unassembled genomes. Conclusions: Red’s innovative methodology and its excellent performance on seven different genomes represent a valuable advancement in the field of repeats discovery.
Knowledge transfer via classification rules using functional mapping for integrative modeling of gene expression data
Background: Most ‘transcriptomic’ data from microarrays are generated from small sample sizes compared to the large number of measured biomarkers, making it very difficult to build accurate and generalizable disease state classification models. Integrating information from different, but related, ‘transcriptomic’ data may help build better classification models. However, most proposed methods for integrative analysis of ‘transcriptomic’ data cannot incorporate domain knowledge, which can improve model performance. To this end, we have developed a methodology that leverages transfer rule learning and functional modules, which we call TRL-FM, to capture and abstract domain knowledge in the form of classification rules to facilitate integrative modeling of multiple gene expression data. TRL-FM is an extension of the transfer rule learner (TRL) that we developed previously. The goal of this study was to test our hypothesis that “an integrative model obtained via the TRL-FM approach outperforms traditional models based on single gene expression data sources”. Results: To evaluate the feasibility of the TRL-FM framework, we compared the area under the ROC curve (AUC) of models developed with TRL-FM and other traditional methods, using 21 microarray datasets generated from three studies on brain cancer, prostate cancer, and lung disease, respectively. The results show that TRL-FM statistically significantly outperforms TRL as well as traditional models based on single source data. In addition, TRL-FM performed better than other integrative models driven by meta-analysis and cross-platform data merging. Conclusions: The capability of utilizing transferred abstract knowledge derived from source data using feature mapping enables the TRL-FM framework to mimic the human process of learning and adaptation when performing related tasks. The novel TRL-FM methodology for integrative modeling for multiple ‘transcriptomic’ datasets is able to intelligently incorporate domain knowledge that traditional methods might disregard, to boost predictive power and generalization performance. In this study, TRL-FM’s abstraction of knowledge is achieved in the form of functional modules, but the overall framework is generalizable in that different approaches of acquiring abstract knowledge can be integrated into this framework.
Distribution Analyzer, a methodology for identifying and clustering outlier conditions from single-cell distributions, and its application to a <it>Nanog</it> reporter RNAi screen
Background: Chemical or small interfering (si) RNA screens measure the effects of many independent experimental conditions, each applied to a population of cells (e.g., all of the cells in a well). High-content screens permit a readout (e.g., fluorescence, luminescence, cell morphology) from each cell in the population. Most analysis approaches compare the average effect on each population, precluding identification of outliers that affect the distribution of the reporter in the population but not its average. Other approaches only measure changes to the distribution with a single parameter, precluding accurate distinction and clustering of interesting outlier distributions. Results: We describe a methodology to identify outlier conditions by considering the cell-level measurements from each condition as a sample of an underlying distribution. With appropriate selection of a distance metric, all effects can be embedded in a fixed-dimensionality Euclidean basis, facilitating identification and clustering of biologically interesting outliers. We demonstrate that measurement of distances with the Hellinger distance metric offers substantial computational efficiencies over alternative metrics. We validate this methodology using an RNA interference (RNAi) screen in mouse embryonic stem cells (ESC) with a Nanog reporter. The methodology clusters effects of multiple control siRNAs into their true identities better than conventional approaches describing the median cell fluorescence or the commonly used Kolmogorov-Smirnov distance between the observed fluorescence distribution and the null distribution. It identifies outlier genes with effects on the reporter distribution that would have been missed by other methods. Among them, siRNA targeting Chek1 leads to a wider Nanog reporter fluorescence distribution. Similarly, siRNA targeting Med14 or Med27 leads to a narrower Nanog reporter fluorescence distribution. We confirm the roles of these three genes in regulating pluripotency by mRNA expression and alkaline phosphatase staining using independent short hairpin (sh) RNAs. Conclusions: Using our methodology, we describe each experimental condition by a probability distribution. Measuring distances between probability distributions permits a multivariate rather than univariate readout. Clustering points derived from these distances allows us to obtain greater biological insight than methods based solely on single parameters. We find several outliers from a mouse ESC RNAi screen that we confirm to be pluripotency regulators. Many of these outliers would have been missed by other analysis methods.
QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments
Background: High-throughput next-generation RNA sequencing has matured into a viable and powerful method for detecting variations in transcript expression and regulation. Proactive quality control is of critical importance as unanticipated biases, artifacts, or errors can potentially drive false associations and lead to flawed results. Results: We have developed the Quality of RNA-Seq Toolset, or QoRTs, a comprehensive, multifunction toolset that assists in quality control and data processing of high-throughput RNA sequencing data. Conclusions: QoRTs generates an unmatched variety of quality control metrics, and can provide cross-comparisons of replicates contrasted by batch, biological sample, or experimental condition, revealing any outliers and/or systematic issues that could drive false associations or otherwise compromise downstream analyses. In addition, QoRTs simultaneously replaces the functionality of numerous other data-processing tools, and can quickly and efficiently generate quality control metrics, coverage counts (for genes, exons, and known/novel splice-junctions), and browser tracks. These functions can all be carried out as part of a single unified data-processing/quality control run, greatly reducing both the complexity and the total runtime of the analysis pipeline. The software, source code, and documentation are available online at http://hartleys.github.io/QoRTs.
Background: Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly. Results: We present a haplotype assembly algorithm, ParticleHap, that relies on a probabilistic description of the sequencing data to jointly infer genotypes and assemble the most likely haplotypes. Our method employs a deterministic sequential Monte Carlo algorithm that associates single nucleotide polymorphisms with haplotypes by exhaustively exploring all possible extensions of the partial haplotypes. The algorithm relies on genotype likelihoods rather than on often erroneously called genotypes, thus ensuring a more accurate assembly of the haplotypes. Results on both the 1000 Genomes Project experimental data as well as simulation studies demonstrate that the proposed approach enables highly accurate solutions to the haplotype assembly problem while being computationally efficient and scalable, generally outperforming existing methods in terms of both accuracy and speed. Conclusions: The developed probabilistic framework and sequential Monte Carlo algorithm enable joint haplotype assembly and genotyping in a computationally efficient manner. Our results demonstrate fast and highly accurate haplotype assembly aided by the re-examination of erroneously called genotypes.A C code implementation of ParticleHap will be available for download from https://sites.google.com/site/asynoeun/particlehap.
groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data
Background: Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units. Results: Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts. Conclusions: Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells.
Background: Predictions of MHC binding affinity are commonly used in immunoinformatics for T cell epitope prediction. There are multiple available methods, some of which provide web access. However there is currently no convenient way to access the results from multiple methods at the same time or to execute predictions for an entire proteome at once. Results: We designed a web application that allows integration of multiple epitope prediction methods for any number of proteins in a genome. The tool is a front-end for various freely available methods. Features include visualisation of results from multiple predictors within proteins in one plot, genome-wide analysis and estimates of epitope conservation. Conclusions: We present a self contained web application, Epitopemap, for calculating and viewing epitope predictions with multiple methods. The tool is easy to use and will assist in computational screening of viral or bacterial genomes.
BSPAT: a fast online tool for DNA methylation co-occurrence pattern analysis based on high-throughput bisulfite sequencing data
Background: Bisulfite sequencing is one of the most widely used technologies in analyzing DNA methylation patterns, which are important in understanding and characterizing the mechanism of DNA methylation and its functions in disease development. Efficient and user-friendly tools are critical in carrying out such analysis on high-throughput bisulfite sequencing data. However, existing tools are either not scalable well, or inadequate in providing visualization and other desirable functionalities. Results: In order to handle ultra large sequencing data and to provide additional functions and features, we have developed BSPAT, a fast online tool for bisulfite sequencing pattern analysis. With a user-friendly web interface, BSPAT seamlessly integrates read mapping/quality control/methylation calling with methylation pattern generation and visualization. BSPAT has the following important features: 1) instead of using multiple/pairwise sequence alignment methods, BSPAT adopts an efficient and widely used sequence mapping tool to provide fast alignment of sequence reads; 2) BSPAT summarizes and visualizes DNA methylation co-occurrence patterns at a single nucleotide level, which provide valuable information in understanding the mechanism and regulation of DNA methylation; 3) based on methylation co-occurrence patterns, BSPAT can automatically detect potential allele-specific methylation (ASM) patterns, which can greatly enhance the detection and analysis of ASM patterns; 4) by linking directly with other popular databases and tools, BSPAT allows users to perform integrative analysis of methylation patterns with other genomic features together within regions of interest. Conclusion: By utilizing a real bisulfite sequencing dataset generated from prostate cancer cell lines, we have shown that BSPAT is highly efficient. It has also reported some interesting methylation co-occurrence patterns and a potential allele-specific methylation case. In conclusion, BSPAT is an efficient and convenient tool for high-throughput bisulfite sequencing data analysis that can be broadly used.
Optimal combination of feature selection and classification via local hyperplane based learning strategy
Background: Classifying cancers by gene selection is among the most important and challenging procedures in biomedicine. A major challenge is to design an effective method that eliminates irrelevant, redundant, or noisy genes from the classification, while retaining all of the highly discriminative genes. Results: We propose a gene selection method, called local hyperplane-based discriminant analysis (LHDA). LHDA adopts two central ideas. First, it uses a local approximation rather than global measurement; second, it embeds a recently reported classification model, K-Local Hyperplane Distance Nearest Neighbor(HKNN) classifier, into its discriminator. Through classification accuracy-based iterations, LHDA obtains the feature weight vector and finally extracts the optimal feature subset. The performance of the proposed method is evaluated in extensive experiments on synthetic and real microarray benchmark datasets. Eight classical feature selection methods, four classification models and two popular embedded learning schemes, including k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), Support Vector Machine (SVM) and Random Forest are employed for comparisons. Conclusion: The proposed method yielded comparable to or superior performances to seven state-of-the-art models. The nice performance demonstrate the superiority of combining feature weighting with model learning into an unified framework to achieve the two tasks simultaneously.
Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm
Background: Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. Results: All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. Conclusions: The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
Background: DNA methylation offers an excellent example for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation. Statistical methods applicable to DNA methylation data analysis span a number of approaches such as Wilcoxon rank sum test, t-test, Kolmogorov–Smirnov test, permutation test, empirical Bayes method, and bump hunting method. Nonetheless, selection of an optimal statistical method can be challenging when different methods generate inconsistent results from the same data set. Results: We compared six statistical approaches relevant to DNA methylation microarray analysis in terms of false discovery rate control, statistical power, and stability through simulation studies and real data examples. Observable differences were noticed between β values and M values only when methylation levels were correlated across CpG loci. For small sample size (n=3 or 6 in each group), both the empirical Bayes and bump hunting methods showed appropriate FDR control and the highest power when methylation levels across CpG loci were independent. Only the bump hunting method showed appropriate FDR control and the highest power when methylation levels across CpG sites were correlated. For medium (n=12 in each group) and large sample sizes (n=24 in each group), all methods compared had similar power, except for the permutation test whenever the proportion of differentially methylated loci was low. For all sample sizes, the bump hunting method had the lowest stability in terms of standard deviation of total discoveries whenever the proportion of differentially methylated loci was large. The apparent test power comparisons based on raw p-values from DNA methylation studies on ovarian cancer and rheumatoid arthritis provided results as consistent as those obtained in the simulation studies. Overall, these results provide guidance for optimal statistical methods selection under different scenarios. Conclusions: For DNA methylation studies with small sample size, the bump hunting method and the empirical Bayes method are recommended when DNA methylation levels across CpG loci are independent, while only the bump hunting method is recommended when DNA methylation levels are correlated across CpG loci. All methods are acceptable for medium or large sample sizes.
Topological characterization of neuronal arbor morphology via sequence representation: I - motif analysis
Background: The morphology of neurons offers many insights into developmental processes and signal processing. Numerous reports have focused on metrics at the level of individual branches or whole arbors; however, no studies have attempted to quantify repeated morphological patterns within neuronal trees. We introduce a novel sequential encoding of neurite branching suitable to explore topological patterns. Results: Using all possible branching topologies for comparison we show that the relative abundance of short patterns of up to three bifurcations, together with overall tree size, effectively capture the local branching patterns of neurons. Dendrites and axons display broadly similar topological motifs (over-represented patterns) and anti-motifs (under-represented patterns), differing most in their proportions of bifurcations with one terminal branch and in select sub-sequences of three bifurcations. In addition, pyramidal apical dendrites reveal a distinct motif profile. Conclusions: The quantitative characterization of topological motifs in neuronal arbors provides a thorough description of local features and detailed boundaries for growth mechanisms and hypothesized computational functions.
Background: The concept of Petri nets (PN) is widely used in systems biology and allows modeling of complex biochemical systems like metabolic systems, signal transduction pathways, and gene expression networks. In particular, PN allows the topological analysis based on structural properties, which is important and useful when quantitative (kinetic) data are incomplete or unknown. Knowing the kinetic parameters, the simulation of time evolution of such models can help to study the dynamic behavior of the underlying system. If the number of involved entities (molecules) is low, a stochastic simulation should be preferred against the classical deterministic approach of solving ordinary differential equations. The Stochastic Simulation Algorithm (SSA) is a common method for such simulations. The combination of the qualitative and semi-quantitative PN modeling and stochastic analysis techniques provides a valuable approach in the field of systems biology. Results: Here, we describe the implementation of stochastic analysis in a PN environment. We extended MonaLisa - an open-source software for creation, visualization and analysis of PN - by several stochastic simulation methods. The simulation module offers four simulation modes, among them the stochastic mode with constant firing rates and Gillespie’s algorithm as exact and approximate versions. The simulator is operated by a user-friendly graphical interface and accepts input data such as concentrations and reaction rate constants that are common parameters in the biological context. The key features of the simulation module are visualization of simulation, interactive plotting, export of results into a text file, mathematical expressions for describing simulation parameters, and up to 500 parallel simulations of the same parameter sets. To illustrate the method we discuss a model for insulin receptor recycling as case study. Conclusions: We present a software that combines the modeling power of Petri nets with stochastic simulation of dynamic processes in a user-friendly environment supported by an intuitive graphical interface. The program offers a valuable alternative to modeling, using ordinary differential equations, especially when simulating single-cell experiments with low molecule counts. The ability to use mathematical expressions provides an additional flexibility in describing the simulation parameters. The open-source distribution allows further extensions by third-party developers. The software is cross-platform and is licensed under the Artistic License 2.0.
Background: We recently identified two robust ovarian cancer subtypes, defined by the expression of genes involved in angiogenesis, with significant differences in clinical outcome. To identify potential regulatory mechanisms that distinguish the subtypes we applied PANDA, a method that uses an integrative approach to model information flow in gene regulatory networks. Results: We find distinct differences between networks that are active in the angiogenic and non-angiogenic subtypes, largely defined by a set of key transcription factors that, although previously reported to play a role in angiogenesis, are not strongly differentially-expressed between the subtypes. Our network analysis indicates that these factors are involved in the activation (or repression) of different genes in the two subtypes, resulting in differential expression of their network targets. Mechanisms mediating differences between subtypes include a previously unrecognized pro-angiogenic role for increased genome-wide DNA methylation and complex patterns of combinatorial regulation. Conclusions: The models we develop require a shift in our interpretation of the driving factors in biological networks away from the genes themselves and toward their interactions. The observed regulatory changes between subtypes suggest therapeutic interventions that may help in the treatment of ovarian cancer.