SPLICESITE Datasets  ALU Dataset  HSS Datasets

HSS Benchmark Dataset:

This dataset is described in the paper. It was originally introduced in: Noble WS, Kuehn S, Thurman R, Yu M, Stamatoyannopoulos JA (2005) Predicting the in vivo signature of human gene regulatory sequences. Bioinformatics 21: i338-i343.

The complete dataset is available at noble.gs.washington.edu/proj/hs.

Dataset
hss.fasta (positive data)
non-hss.fasta (negative data)

Splice Site Benchmark Dataset:

NN269

This is one of the two datasets used for splice site detection. It was originally introduced in: Reese MG, Eeckman F, Kulp D, Haussler D (1997) Improved splice site detection in genie. J COMPUT BIOL 4: 311{323.

Dataset:
Acceptor Train (positive) Acceptor_Train(negative)
Acceptor Test (positive) Acceptor_Test (negative)
Donor Train (positive) Donor_Train(negative)
Donor Test (positive) Donor_Test (negative)

C_Elegans

This is the second dataset used for splice site detection. IT was originally introduced in: Sonnenburg S, Schweikert G, Philips P, Behr J, Ratsch G (2007) Accurate splice site prediction using support vector machines. BMC Bioinformatics 8.

Dataset:
Acceptor
Donor

ALU Sequences Dataset:

This dataset is also described in the paper. It was originally introduced in: Jurka J (1993) A new subfamily of recently retroposed human alu repeats. Nucl Acids Res 21:2252.

Dataset:
Alu Positives
Alu Negatives