Evolutionary-based Kernel Optimization

This work concerns the improved classification of hypersensitive sites (HS) in genomic sequences. Such sites are reliable markers of DNA regulatory regions that control gene expression. Annotation of regulatory regions is important in understanding phenotypical differences among cells and diseases linked to pathologies in protein expression. To this day, while our gene finding efforts have paid off, our understanding of the location and role of regulatory regions lags behind.

Several computational techniques are devoted to mapping out regulatory regions in DNA by initially identifying HS sequences. Most noted among these, statistical learning techniques like Support Vector Machines (SVM), for instance, are employed to classify DNA sequences sequences as HS or non-HS.

This line of our work proposes a method to automate the basic steps in designing an SVM that improves the accuracy of such classification. The method proceeds in two stages and makes use of evolutionary algorithms. An evolutionary algorithm that employs genetic algorithmic techniques first designs optimal sequence motifs to associate explicit discriminating feature vectors with input DNA sequences. A second evolutionary algorithm then designs SVM kernel functions and parameters that optimally separate the HS and non-HS classes. This algorithm employs genetic programming techniques to evolve kernel functions.

Results show that this two-stage method significantly improves SVM classification accuracy. The method promises to be generally useful in automating the analysis of biological sequences, as our current and ongoing work on evolutionary-based feature generation demonstrates. <\p>

An earlier, limited version of this work appeared in: Uday Kamath, Amarda Shehu, and Kenneth A De Jong, "Feature and Kernel Evolution for Recognition of Hypersensitive Sites in DNA Sequences," Intl. Conference on Bio-inspired Models of Network, Information, and Computing Systems (BIONETICS), Boston, MA, 2010. A more expansive version appears in: Uday Kamath, Amarda Shehu, and Kenneth A. De Jong, "A Two-Stage Evolutionary Approach for Effective Classification of Hypersensitive DNA Sequence," J Bioinf and Comp Biol 2011.

In response to community interests on details of the method and usability of the code, we post further details here. Please feel free to contact any of the authors for any questions.

On this Project:

  • Uday Kamath

    Amarda Shehu

    Kenneth De Jong