Banner
Computer Science Department Seminars

2004-2005 Academic Year

LEARNING TO DETECT MALICIOUS EXECUTABLES 
Speaker: MARK MALOOF

Learning to Detect Malicious Executables

Mark Maloof
Department of Computer Science
Georgetown University
Washington, DC 20057
maloof@cs.georgetown.edu
http://www.cs.georgetown.edu/~maloof


Abstract

In this talk, I will describe the development of a fielded 
application for detecting malicious executables in the wild.  
We gathered 1971 benign and 1651 malicious executables and 
encoded each as a training example using n-grams of byte codes 
as features.  Such processing resulted in more than 255 million 
distinct n-grams.  After selecting the most relevant n-grams for 
prediction, we evaluated a variety of inductive methods, including 
naive Bayes, decision trees, support vector machines,and boosting.  
Ultimately, boosted decision trees outperformed other methods with an 
area under the ROC curve of 0.996.  Results also suggest that our 
methodology will scale to larger collections of executables. To the 
best of our knowledge, ours is the only fielded application for this 
task developed using techniques from machine learning and data 
mining.