The latest research articles published by BMC Bioinformatics
Updated: 21 weeks 3 days ago
Background: Identification of bacteria may be based on sequencing and molecular analysis of a specific locus such as 16S rRNA, or a set of loci such as in multilocus sequence typing. In the near future, healthcare institutions and routine diagnostic microbiology laboratories may need to sequence the entire genome of microbial isolates. Therefore we have developed Reads2Type, a web-based tool for taxonomy identification based on whole bacterial genome sequence data. Results: Raw sequencing data provided by the user are mapped against a set of marker probes that are derived from currently available bacteria complete genomes. Using a dataset of 1003 whole genome sequenced bacteria from various sequencing platforms, Reads2Type was able to identify the species with 99.5 % accuracy and on the minutes time scale. Conclusions: In comparison with other tools, Reads2Type offers the advantage of not needing to transfer sequencing files, as the entire computational analysis is done on the computer of whom utilizes the web application. This also prevents data privacy issues to arise. The Reads2Type tool is available at http://www.cbs.dtu.dk/~dhany/reads2type.html.
Analysis of in vivo single cell behavior by high throughput, human-in-the-loop segmentation of three-dimensional images
Background: Analysis of single cells in their native environment is a powerful method to address key questions in developmental systems biology. Confocal microscopy imaging of intact tissues, followed by automatic image segmentation, provides a means to conduct cytometric studies while at the same time preserving crucial information about the spatial organization of the tissue and morphological features of the cells. This technique is rapidly evolving but is still not in widespread use among research groups that do not specialize in technique development, perhaps in part for lack of tools that automate repetitive tasks while allowing experts to make the best use of their time in injecting their domain-specific knowledge. Results: Here we focus on a well-established stem cell model system, the C. elegans gonad, as well as on two other model systems widely used to study cell fate specification and morphogenesis: the pre-implantation mouse embryo and the developing mouse olfactory epithelium. We report a pipeline that integrates machine-learning-based cell detection, fast human-in-the-loop curation of these detections, and running of active contours seeded from detections to segment cells. The procedure can be bootstrapped by a small number of manual detections, and outperforms alternative pieces of software we benchmarked on C. elegans gonad datasets. Using cell segmentations to quantify fluorescence contents, we report previously-uncharacterized cell behaviors in the model systems we used. We further show how cell morphological features can be used to identify cell cycle phase; this provides a basis for future tools that will streamline cell cycle experiments by minimizing the need for exogenous cell cycle phase labels. Conclusions: High-throughput 3D segmentation makes it possible to extract rich information from images that are routinely acquired by biologists, and provides insights — in particular with respect to the cell cycle — that would be difficult to derive otherwise.
Fully automated registration of vibrational microspectroscopic images in histologically stained tissue sections
Background: In recent years, hyperspectral microscopy techniques such as infrared or Raman microscopy have been applied successfully for diagnostic purposes. In many of the corresponding studies, it is common practice to measure one and the same sample under different types of microscopes. Any joint analysis of the two image modalities requires to overlay the images, so that identical positions in the sample are located at the same coordinate in both images. This step, commonly referred to as image registration, has typically been performed manually in the lack of established automated computational registration tools. Results: We propose a corresponding registration algorithm that addresses this registration problem, and demonstrate the robustness of our approach in different constellations of microscopes. First, we deal with subregion registration of Fourier Transform Infrared (FTIR) microscopic images in whole-slide histopathological staining images. Second, we register FTIR imaged cores of tissue microarrays in their histopathologically stained counterparts, and finally perform registration of Coherent anti-Stokes Raman spectroscopic (CARS) images within histopathological staining images. Conclusions: Our validation involves a large variety of samples obtained from colon, bladder, and lung tissue on three different types of microscopes, and demonstrates that our proposed method works fully automated and highly robust in different constellations of microscopes involving diverse types of tissue samples.
High-order dynamic Bayesian Network learning with hidden common causes for causal gene regulatory network
Background: Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes. Results: We have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain. Conclusions: We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.
Coev-web: a web platform designed to simulate and evaluate coevolving positions along a phylogenetic tree
Background: Available methods to simulate nucleotide or amino acid data typically use Markov models to simulate each position independently. These approaches are not appropriate to assess the performance of combinatorial and probabilistic methods that look for coevolving positions in nucleotide or amino acid sequences. Results: We have developed a web-based platform that gives a user-friendly access to two phylogenetic-based methods implementing the Coev model: the evaluation of coevolving scores and the simulation of coevolving positions. We have also extended the capabilities of the Coev model to allow for the generalization of the alphabet used in the Markov model, which can now analyse both nucleotide and amino acid data sets. The simulation of coevolving positions is novel and builds upon the developments of the Coev model. It allows user to simulate pairs of dependent nucleotide or amino acid positions. Conclusions: The main focus of our paper is the new simulation method we present for coevolving positions. The implementation of this method is embedded within the web platform Coev-web that is freely accessible at http://coev.vital-it.ch/, and was tested in most modern web browsers.
MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions
Background: The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem. Results: We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log–loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log–loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65). Conclusions: We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most–widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum–of–pairs and total column metrics.
Background: The number of γH2AX foci per nucleus is an accepted measure of the number of DNA double-strand breaks in single cells. One of the experimental techniques for γH2AX detection in cultured cells is immunofluorescent labelling of γH2AX and nuclei followed by microscopy imaging and analysis. Results: In this study, we present the algorithm FoCo for reliable and robust automatic nuclear foci counting in single cell images. FoCo has the following advantages with respect to other software packages: i) the ability to reliably quantify even densely distributed foci, e.g., on images of cells subjected to radiation doses up to 10 Gy, ii) robustness of foci quantification in the sense of suppressing out-of-focus background signal, and iii) its simplicity. FoCo requires only 5 parameters that have to be adjusted by the user. Conclusions: FoCo is an open-source user-friendly software with GUI for individual foci counting, which is able to produce reliable and robust foci quantifications even for low signal/noise ratios and densely distributed foci.
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
Background: Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions.DescriptionWe have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Conclusions: This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
An heuristic filtering tool to identify phenotype-associated genetic variants applied to human intellectual disability and canine coat colors
Background: Identification of one or several disease causing variant(s) from the large collection of variants present in an individual is often achieved by the sequential use of heuristic filters. The recent development of whole exome sequencing enrichment designs for several non-model species created the need for a species-independent, fast and versatile analysis tool, capable of tackling a wide variety of standard and more complex inheritance models. With this aim, we developed “Mendelian”, an R-package that can be used for heuristic variant filtering. Results: The R-package Mendelian offers fast and convenient filters to analyze putative variants for both recessive and dominant models of inheritance, with variable degrees of penetrance and detectance. Analysis of trios is supported. Filtering against variant databases and annotation of variants is also included. This package is not species specific and supports parallel computation. We validated this package by reanalyzing data from a whole exome sequencing experiment on intellectual disability in humans. In a second example, we identified the mutations responsible for coat color in the dog. This is the first example of whole exome sequencing without prior mapping in the dog. Conclusion: We developed an R-package that enables the identification of disease-causing variants from the long list of variants called in sequencing experiments. The software and a detailed manual are available at https://github.com/BartBroeckx/Mendelian.
Background: Many functional RNA molecules fold into pseudoknot structures, which are often essential for the formation of an RNA’s 3D structure. Currently the design of RNA molecules, which fold into a specific structure (known as RNA inverse folding) within biotechnological applications, is lacking the feature of incorporating pseudoknot structures into the design. Hairpin-(H)- and kissing hairpin-(K)-type pseudoknots cover a wide range of biologically functional pseudoknots and can be represented on a secondary structure level. Results: The RNA inverse folding program antaRNA, which takes secondary structure, target GC-content and sequence constraints as input, is extended to provide solutions for such H- and K-type pseudoknotted secondary structure constraint.We demonstrate the easy and flexible interchangeability of modules within the antaRNA framework by incorporating pKiss as structure prediction tool capable of predicting the mentioned pseudoknot types. The performance of the approach is demonstrated on a subset of the Pseudobase ++ dataset. Conclusions: This new service is available via a standalone version and is also part of the Freiburg RNA Tools webservice. Furthermore, antaRNA is available in Galaxy and is part of the RNA-workbench Docker image.
Background: Numerous tools have been developed to predict the fitness effects (i.e., neutral, deleterious, or beneficial) of genetic variants on corresponding proteins. However, prediction in terms of whether a variant causes the variant bearing protein to lose the original function or gain new function is also needed for better understanding of how the variant contributes to disease/cancer. To address this problem, the present work introduces and computationally defines four types of functional outcome of a variant: gain, loss, switch, and conservation of function. The deployment of multiple hidden Markov models is proposed to computationally classify mutations by the four functional impact types. Results: The functional outcome is predicted for over a hundred thyroid stimulating hormone receptor (TSHR) mutations, as well as cancer related mutations in oncogenes or tumor suppressor genes. The results show that the proposed computational method is effective in fine grained prediction of the functional outcome of a mutation, and can be used to help elucidate the molecular mechanism of disease/cancer causing mutations. The program is freely available at http://bioinformatics.cs.vt.edu/zhanglab/HMMvar/download.php. Conclusion: This work is the first to computationally define and predict functional impact of mutations, loss, switch, gain, or conservation of function. These fine grained predictions can be especially useful for identifying mutations that cause or are linked to cancer.
Background: Pathway analysis methods, in which differentially expressed genes are mapped to databases of reference pathways and relative enrichment is assessed, help investigators to propose biologically relevant hypotheses. The last generation of pathway analysis methods takes into account the topological structure of a pathway, which helps to increase both specificity and sensitivity of the findings. Simultaneously, the RNA-Seq technology is gaining popularity and becomes widely used for gene expression profiling. Unfortunately, majority of topological pathway analysis methods remains without implementation and if an implementation exists, it is limited in various factors. Results: We developed a new R/Bioconductor package ToPASeq offering uniform interface to seven distinct topology-based pathway analysis methods, of which three we implemented de-novo and four were adjusted from existing implementations. Apart this, ToPASeq offers a set of tailored visualization functions and functions for importing and manipulating pathways and their topologies, facilitating the application of the methods on different species. The package can be used to compare the differential expression of pathways between two conditions on both gene expression microarray and RNA-Seq data. The package is written in R and is available from Bioconductor 3.2 using AGPL-3 license. Conclusion: ToPASeq is a novel package that offers seven distinct methods for topology-based pathway analysis, which are easily applicable on microarray as well as RNA-Seq data, both in human and other species. At the same time, it provides specific tools for visualization of the results.
Background: ChIP-seq experiments are widely used to detect and study DNA-protein interactions, such as transcription factor binding and chromatin modifications. However, downstream analysis of ChIP-seq data is currently restricted to the evaluation of signal intensity and the detection of enriched regions (peaks) in the genome. Other features of peak shape are almost always neglected, despite the remarkable differences shown by ChIP-seq for different proteins, as well as by distinct regions in a single experiment. Results: We hypothesize that statistically significant differences in peak shape might have a functional role and a biological meaning. Thus, we design five indices able to summarize peak shapes and we employ multivariate clustering techniques to divide peaks into groups according to both their complexity and the intensity of their coverage function. In addition, our novel analysis pipeline employs a range of statistical and bioinformatics techniques to relate the obtained peak shapes to several independent genomic datasets, including other genome-wide protein-DNA maps and gene expression experiments. To clarify the meaning of peak shape, we apply our methodology to the study of the erythroid transcription factor GATA-1 in K562 cell line and in megakaryocytes. Conclusions: Our study demonstrates that ChIP-seq profiles include information regarding the binding of other proteins beside the one used for precipitation. In particular, peak shape provides new insights into cooperative transcriptional regulation and is correlated to gene expression.
Background: Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. Methods: We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. Results: The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Conclusion: Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.
Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data
Background: Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. Results: In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Conclusion: Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.
Background: Functional annotation of genes and gene products is a major challenge in the post-genomic era. Nowadays, gene function curation is largely based on manual assignment of Gene Ontology (GO) annotations to genes by using published literature. The annotation task is extremely time-consuming, therefore there is an increasing interest in automated tools that can assist human experts. Results: Here we introduce GOTA, a GO term annotator for biomedical literature. The proposed approach makes use only of information that is readily available from public repositories and it is easily expandable to handle novel sources of information. We assess the classification capabilities of GOTA on a large benchmark set of publications. The overall performances are encouraging in comparison to the state of the art in multi-label classification over large taxonomies. Furthermore, the experimental tests provide some interesting insights into the potential improvement of automated annotation tools. Conclusions: GOTA implements a flexible and expandable model for GO annotation of biomedical literature. The current version of the GOTA tool is freely available at http://gota.apice.unibo.it.
Extended notions of sign consistency to relate experimental data to signaling and regulatory network topologies
Background: A rapidly growing amount of knowledge about signaling and gene regulatory networks is available in databases such as KEGG, Reactome, or RegulonDB. There is an increasing need to relate this knowledge to high-throughput data in order to (in)validate network topologies or to decide which interactions are present or inactive in a given cell type under a particular environmental condition. Interaction graphs provide a suitable representation of cellular networks with information flows and methods based on sign consistency approaches have been shown to be valuable tools to (i) predict qualitative responses, (ii) to test the consistency of network topologies and experimental data, and (iii) to apply repair operations to the network model suggesting missing or wrong interactions. Results: We present a framework to unify different notions of sign consistency and propose a refined method for data discretization that considers uncertainties in experimental profiles. We furthermore introduce a new constraint to filter undesired model behaviors induced by positive feedback loops. Finally, we generalize the way predictions can be made by the sign consistency approach. In particular, we distinguish strong predictions (e.g. increase of a node level) and weak predictions (e.g., node level increases or remains unchanged) enlarging the overall predictive power of the approach. We then demonstrate the applicability of our framework by confronting a large-scale gene regulatory network model of Escherichia coli with high-throughput transcriptomic measurements. Conclusion: Overall, our work enhances the flexibility and power of the sign consistency approach for the prediction of the behavior of signaling and gene regulatory networks and, more generally, for the validation and inference of these networks
Background: Orientation and the degree of isotropy are important in many biological systems such as the sarcomeres of cardiomyocytes and other fibrillar structures of the cytoskeleton. Image based analysis of such structures is often limited to qualitative evaluation by human experts, hampering the throughput, repeatability and reliability of the analyses. Software tools are not readily available for this purpose and the existing methods typically rely at least partly on manual operation. Results: We developed CytoSpectre, an automated tool based on spectral analysis, allowing the quantification of orientation and also size distributions of structures in microscopy images. CytoSpectre utilizes the Fourier transform to estimate the power spectrum of an image and based on the spectrum, computes parameter values describing, among others, the mean orientation, isotropy and size of target structures. The analysis can be further tuned to focus on targets of particular size at cellular or subcellular scales. The software can be operated via a graphical user interface without any programming expertise. We analyzed the performance of CytoSpectre by extensive simulations using artificial images, by benchmarking against FibrilTool and by comparisons with manual measurements performed for real images by a panel of human experts. The software was found to be tolerant against noise and blurring and superior to FibrilTool when analyzing realistic targets with degraded image quality. The analysis of real images indicated general good agreement between computational and manual results while also revealing notable expert-to-expert variation. Moreover, the experiment showed that CytoSpectre can handle images obtained of different cell types using different microscopy techniques. Finally, we studied the effect of mechanical stretching on cardiomyocytes to demonstrate the software in an actual experiment and observed changes in cellular orientation in response to stretching. Conclusions: CytoSpectre, a versatile, easy-to-use software tool for spectral analysis of microscopy images was developed. The tool is compatible with most 2D images and can be used to analyze targets at different scales. We expect the tool to be useful in diverse applications dealing with structures whose orientation and size distributions are of interest. While designed for the biological field, the software could also be useful in non-biological applications.
DiSNPindel: improved intra-individual SNP and InDel detection in direct amplicon sequencing of a diploid
Background: Amplicon re-sequencing based on the automated Sanger method remains popular for detection of single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (InDels) for a spectrum of genetics applications. However, existing software tools for detecting intra-individual SNPs and InDels in direct amplicon sequencing of diploid samples are insufficient in analyzing single traces and their accuracy is still limited. Results: We developed a novel computation tool, named DiSNPindel, to improve the detection of intra-individual SNPs and InDels in direct amplicon sequencing of a diploid. Neither reference sequence nor additional sample was required. Using two real datasets, we demonstrated the usefulness of DiSNPindel in its ability to improve largely the true SNP and InDel discovery rates and reduce largely the missed and false positive rates as compared with existing detection methods. Conclusions: The software DiSNPindel presented here provides an efficient tool for intra-individual SNP and InDel detection in diploid amplicon sequencing. It will also be useful for identification of DNA variations in expressed sequence tag (EST) re-sequencing.
Background: Reconstruction of neuron anatomy structure is a challenging and important task in neuroscience. However, few algorithms can automatically reconstruct the full structure well without manual assistance, making it essential to develop new methods for this task. Methods: This paper introduces a new pipeline for reconstructing neuron anatomy structure from 3-D microscopy image stacks. This pipeline is initialized with a set of seeds that were detected by our proposed Sliding Volume Filter (SVF), given a non-circular cross-section of a neuron cell. Then, an improved open curve snake model combined with a SVF external force is applied to trace the full skeleton of the neuron cell. A radius estimation method based on a 2D sliding band filter is developed to fit the real edge of the cross-section of the neuron cell. Finally, a surface reconstruction method based on non-parallel curve networks is used to generate the neuron cell surface to finish this pipeline. Results: The proposed pipeline has been evaluated using publicly available datasets. The results show that the proposed method achieves promising results in some datasets from the DIgital reconstruction of Axonal and DEndritic Morphology (DIADEM) challenge and new BigNeuron project. Conclusion: The new pipeline works well in neuron tracing and reconstruction. It can achieve higher efficiency, stability and robustness in neuron skeleton tracing. Furthermore, the proposed radius estimation method and applied surface reconstruction method can obtain more accurate neuron anatomy structures.