Code and Documentation:
Source Code for Evolutionary Feature Construction (EFC) is available here: EFC_Source.zip
File Descriptions for Code:
- AMPProblem.java:
This code is used to read the training data
files in the fasta format (positives and
negatives, separately). The code evaluates
each GP feature tree by interpreting it on
every positive and negative peptide, using
the fitness function described in the paper.
- HallOfFame.java and StatisticalPluginForHallOfFame.java:
Code in these files collects top features from each
generation and stores them in the hall of
fame. For persistence, this is written to a
file.
- AMPSequenceFeatureInterpreter.java:
This code is used to take the features in the
hall of fame and run them against
training/test data files to prepare data files
that can be used to train a machine learning
model. Teh data files are in the format of
dtbsvm files, i.e
featurenumber:boolean(1/0). The code also
cleans up some features, removing
trivial redundancies.
- amp.params:
This is the parameter file used to set the
mutation/crossover rate in EFC, the number
of non-terminals and terminals, constraints
for values of these, as well as various
tuning parameters, such as maximum positions
and training file locations.