
|
Computer Science Department Seminars
2004-2005 Academic YearLEARNING TO DETECT MALICIOUS EXECUTABLES Speaker: MARK MALOOF Learning to Detect Malicious Executables Mark Maloof Department of Computer Science Georgetown University Washington, DC 20057 maloof@cs.georgetown.edu http://www.cs.georgetown.edu/~maloof Abstract In this talk, I will describe the development of a fielded application for detecting malicious executables in the wild. We gathered 1971 benign and 1651 malicious executables and encoded each as a training example using n-grams of byte codes as features. Such processing resulted in more than 255 million distinct n-grams. After selecting the most relevant n-grams for prediction, we evaluated a variety of inductive methods, including naive Bayes, decision trees, support vector machines,and boosting. Ultimately, boosted decision trees outperformed other methods with an area under the ROC curve of 0.996. Results also suggest that our methodology will scale to larger collections of executables. To the best of our knowledge, ours is the only fielded application for this task developed using techniques from machine learning and data mining. |