HSS    ALU    SpliceSite

Sample of Data, Models and Features

We have tried to be as transparent as possible, giving the entire data mapped to ARFF, learning language of WEKA. Showing samples of our cross-validation or train/test, so that researchers can validate.

1) HSS: For Hypersensitve site dataset we have made following things available.

a) HSS-Entire.arff
Entire HSS data when run through EFFECT. This is more for feature strength analysis or understanding features that contribute more.

b)Sample Fold for Cross-Validation
We sampled 90% of data for training using same ration of classes/data, ran through EFFECT and used the 10% for testing. In paper we did this 10 times separately but here we are giving just a sample to try. Hss-Fold-1-Train.arff is the stratified sample of 90% and Hss-Fold-1-Test.arff is the stratified sample of 10% of data.

c)HSSFeatures.txt
When entire data is run through EFFECT, the features obtained are given above.

d)TopHSSFeatures.txt
This is smaller subset of the above hall of fame which when run with above configuration gave the specified results.

2) ALU Sequences: For Alu Sequences dataset we have made following things available

a) Alu-Entire.arff
Entire Alu ata when run through EFFECT. This is more for feature strength analysis or understanding features that contribute more.

b)Sample Fold for Cross-Validation
We sampled 90% of data for training using same ration of classes/data, ran through EFFECT and used the 10% for testing. In paper we did this 10 times separately but here we are giving just a sample to try. Alu-Fold-1-Train.arff is the stratified sample of 90% and Alu-Fold-1-Test.arff is the stratified sample of 10% of data.

c)AluFeatures.txt
When entire data is run through EFFECT, the features obtained are given above.

d) TopALUFeatures.txt
This is smaller subset of the above hall of fame which when run with above configuration gave the specified results.

3)Splice Site:

3.1) NN269:

3.1.a) Training ,Testing Datasets with Model: We have made entire training data going through EFFECT available as NN269Model.model, the training file as NN269Training.arff, testing file as NN269Test.arff and the Results as NN269Results.

3.1.b) Features: We have made the feature file available for researchers to analyze at NN269Features.txt

3.1 c) Top Features: Top Features for NN269 is made available here at TopNN269Features.txt

3.2) Worm/C_Elegans:

3.2.a) Training/Testing Datasets and Model: Entire Acceptor and Donor data when passed through EFFECT are available as C_Elegans-Entire-Acceptor-Reduced.arff and C_Elegans-Entire-Donor-Reduced.arff. We have also made 90% of data for training using same ration of classes/data, ran through EFFECT and used the 10% for testing for both acceptor and donors as C_Elegans-Acceptor-Fold-1-Training.arff, C_Elegans-Acceptor-Fold-1-Testing.arff and C_Elegans-Donor-Fold-1-Training.arff, C_Elegans-Donor-Fold-1-Testing.arff respectively.

3.2 b)Features: We have made the features for acceptors and donors available for researchers to analyze at C_ElegansAcceptorFeatures.txt and C_ElegansDonorFeatures.txt

3.3 c) Top Features : The top features for C_Elegans acceptor and donors are made available here at
 TopClegansAcceptorFeatures.txt and TopClegansDonorFeatures.txt