HSS ALU
SpliceSite
Sample of Data, Models and Features
We have tried to be as transparent as possible, giving the
entire data mapped to ARFF, learning language of WEKA. Showing
samples of our cross-validation or train/test, so that
researchers can validate.
1) HSS:
For Hypersensitve site dataset we have made following things
available.
a) HSS-Entire.arff
Entire HSS data when run through EFFECT. This is more for
feature strength analysis or understanding features that
contribute more.
b)Sample Fold for Cross-Validation
We sampled 90% of data for training using same ration of
classes/data, ran through EFFECT and used the 10% for testing.
In paper we did this 10 times separately but here we are
giving just a sample to try. Hss-Fold-1-Train.arff is
the stratified sample of 90% and Hss-Fold-1-Test.arff is the
stratified sample of 10% of data.
c)HSSFeatures.txt
When entire data is run through EFFECT, the features obtained
are given above.
d)TopHSSFeatures.txt
This is smaller subset of the above hall of fame which when
run with above configuration gave the specified results.
2) ALU Sequences:
For Alu Sequences dataset we have made following things
available
a) Alu-Entire.arff
Entire Alu ata when run through EFFECT. This is more for
feature strength analysis or understanding features that
contribute more.
b)Sample Fold for Cross-Validation
We sampled 90% of data for training using same ration of
classes/data, ran through EFFECT and used the 10% for testing.
In paper we did this 10 times separately but here we are
giving just a sample to try. Alu-Fold-1-Train.arff
is the stratified sample of 90% and Alu-Fold-1-Test.arff is
the stratified sample of 10% of data.
c)AluFeatures.txt
When entire data is run through EFFECT, the features obtained
are given above.
d) TopALUFeatures.txt
This is smaller subset of the above hall of fame which when
run with above configuration gave the specified results.
3)Splice Site:
3.1) NN269:
3.1.a) Training ,Testing Datasets with Model: We have
made entire training data going through EFFECT available as NN269Model.model,
the training file as NN269Training.arff,
testing file as NN269Test.arff
and the Results as NN269Results.
3.1.b) Features: We have made the feature file
available for researchers to analyze at NN269Features.txt
3.1 c) Top Features: Top Features for NN269 is made available here at TopNN269Features.txt
3.2) Worm/C_Elegans:
3.2.a) Training/Testing Datasets and Model: Entire
Acceptor and Donor data when passed through EFFECT are
available as C_Elegans-Entire-Acceptor-Reduced.arff
and C_Elegans-Entire-Donor-Reduced.arff.
We have also made 90% of data for training using same ration
of classes/data, ran through EFFECT and used the 10% for
testing for both acceptor and donors as C_Elegans-Acceptor-Fold-1-Training.arff,
C_Elegans-Acceptor-Fold-1-Testing.arff
and C_Elegans-Donor-Fold-1-Training.arff,
C_Elegans-Donor-Fold-1-Testing.arff
respectively.
3.2 b)Features: We have made the features for acceptors
and donors available for researchers to analyze at C_ElegansAcceptorFeatures.txt
and C_ElegansDonorFeatures.txt
3.3 c) Top Features : The top features for C_Elegans
acceptor and donors are made available here at
TopClegansAcceptorFeatures.txt
and TopClegansDonorFeatures.txt