SPLICESITE
Datasets ALU Dataset HSS Datasets
HSS Benchmark Dataset:
This dataset is described in the paper. It was originally
introduced in: Noble WS, Kuehn S, Thurman R, Yu M,
Stamatoyannopoulos JA (2005) Predicting the in vivo
signature of human gene regulatory sequences. Bioinformatics
21: i338-i343.
The complete dataset is available at noble.gs.washington.edu/proj/hs.
Dataset
hss.fasta
(positive data)
non-hss.fasta
(negative data)
Splice Site Benchmark Dataset:
NN269
This is one of the two datasets used for splice site
detection. It was originally introduced in: Reese MG, Eeckman
F, Kulp D, Haussler D (1997) Improved splice site detection in
genie. J COMPUT BIOL 4: 311{323.
Dataset:
Acceptor
Train (positive) Acceptor_Train(negative)
Acceptor
Test (positive) Acceptor_Test (negative)
Donor
Train (positive) Donor_Train(negative)
Donor
Test (positive)
Donor_Test (negative)
C_Elegans
This is the second dataset used for splice site
detection. IT was originally introduced in: Sonnenburg S,
Schweikert G, Philips P, Behr J, Ratsch G (2007) Accurate
splice site prediction using support vector machines. BMC
Bioinformatics 8.
Dataset:
Acceptor
Donor
This dataset is also described in the paper. It was
originally introduced in: Jurka J (1993) A new subfamily of
recently retroposed human alu repeats. Nucl Acids Res
21:2252.
Dataset:
Alu
Positives
Alu
Negatives