2014 Technical Reports | George Mason Department of Computer Science

Convex Multi-task Relationship Learning using Hinge Loss

Anveshi Charuvaka and Huzefa Rangwala

GMU-CS-TR-2014-8

Multi-task learning improves generalization performance by learning several related tasks jointly. Several methods have been proposed for multi- task learning in recent years. Many methods make strong assumptions about symmetric task relationships while some are able to utilize externally provided task relationships. However, in many real world tasks the degree of relatedness among tasks is not known a priori. Methods which are able to extract the task relationships and exploit them while simultaneously learning models with good generalization performance can address this limitation. In the current work, we have extended a recently proposed method for learning task relationships using smooth squared loss for regression to classification problems using non-smooth hinge loss due to the demonstrated effectiveness of SVM classifier in single task classification settings. We have also developed an efficient optimization procedure using bundle methods for the proposed multi-task learning formulation. We have validated our method on one simulated and two real world datasets and have compared its performance to competitive baseline single-task and multi-task methods.

Document Content Layout Based Exploit Protections

Charles Smutz and Angelos Stavrou

GMU-CS-TR-2014-7

Malware laden documents are a common exploit vector, especially in targeted attacks. Most current approaches seek to detect the malicious attributes of documents whether through signature matching, dynamic analysis, or machine learning. We take a different approach: we perform transformations on documents that render exploits inoperable while maintaining the visual interpretation of the document intact. Our exploit mitigation techniques are similar in effect to address space layout randomization, but we implement them through permutations to the document layout. Using document content layout randomization, we randomize the data block layout of Microsoft OLE files in a manner similar to the inverse of a file system defragmention tool. This relocates malicious payloads in both the original document file and in the memory of the reader program. We demonstrate that our approach indeed subdues in the wild exploits by manual validation of both Office 2003 and Office 2007 malicious documents while the transformed documents continue to render benign content properly. The document transformation can be performed offline and requires only a single document scan while the user-perceived delay when opening the transformed document is negligible. We also show that it is possible to thwart malicious heap sprays by injecting benign content in documents. This approach, however, comes at an expense of computational and memory resources.

Decision Guidance Analytics Language (DGAL): Toward Reusable Knowledge Base Centric Modeling

Alexander Brodsky and Juan Luo

GMU-CS-TR-2014-6

Decision guidance systems are a class of decision support systems that are geared toward producing actionable recommendations, typically based on formal analytical models and techniques. This paper proposes the Decision Guidance Analytics Language (DGAL) for easy iterative development of decision guidance systems. DGAL allows the creation of modular, reusable and composable models that are stored in the analytical knowledge base independently of the tasks and tools that use them. Based on these unified models, DGAL supports declarative queries of (1) data manipulation, (2) what-if prediction analysis, (3) decision optimization based on mathematical and constraint programming, and (4) machine learning, all through formal reduction to specialized models and tools, and in the presence of uncertainty.

Building a Hierarchical Finite-State Automaton: A Primer

Sean Luke

GMU-CS-TR-2014-5

Additional Files: techreports/GMU-CS-TR-2014-5.zip

This primer introduces concepts in hierarchical finite-state automata (HFA) and their use, gives a rough formalism, and then builds an entire HFA library, with all the trimmings, in Lua. The primer is intended to educate the reader on how HFA can be used, how they are programmed, and how HFA libraries are built.

Toward Reusable Models: System Development for Optimization Analytics Language (OAL)

Abdullah Alrazgan and Alexander Brodsky

GMU-CS-TR-2014-4

Decision optimization has been broadly used in many areas including economics, finance, manufacturing, logistics, and engineering. However, optimization modeling used for decision optimization presents two major challenges. First, it requires expertise in operation research and mathematical programming, which most users, including software developers, engineers, business analysts and end-users, typically do not have. Second, optimization models are usually highly customized, not modular, and not extensible. Sustainable Process Analytics Formalism (SPAF) has been recently proposed at National Institute of Standards and Technology (NIST) to address these challenges, by transitioning from the redevelopment of optimization solutions from scratch towards maintaining an extensible library of analytical models, against which declarative optimization queries can be asked. However, the development of practical algorithms and a system for SPAF presents a number of technical challenges, which have not been addressed. In this paper we (1) propose an object-oriented extension of SPAF, named Optimization Analytics Language (OAL); (2) present OAL system development based on methods to reduce OAL models and queries into a formal mathematical programming formulation; (3) showcase OAL through a simple case study in the domain of manufacturing; and (4) conduct a preliminary experimental study to assess the overhead introduced by OAL.

Temporal Manufacturing Query Language (tMQL) for Domain Specific Composition, What-if Analysis, and Optimization of Manufacturing Processes With Inventories

Mohan Krishnamoorthy, Alexander Brodsky and Daniel A. Menascé

GMU-CS-TR-2014-3

Smart manufacturing requires streamlining operations and optimizing processes at a global and local level. This paper considers manufacturing processes that involve physical or virtual inventories of products, parts and materials, that move from machine to machine. The inventory levels vary with time and are a function of the configuration settings of the machines involved in the process. These environments require analysis, e.g., answering what-if questions, and optimization to determine optimal operating settings for the entire process. The modeling complexities in performing these tasks are not always within the grasp of production engineers. To address this problem, the paper proposes the temporal Manufacturing Query Language (tMQL) that allows the composition of modular process models for what-if analysis and decision optimization queries. tQML supports an extensible and reusable model knowledge base against which declarative queries can be posed. Additionally, the paper describes the steps to translate the components of a tMQL model to input data files used by a commercial optimization solver.

Trace Driven Analytic Modeling for Evaluating Schedulers for Clusters of Computers

Shouvik Bhardan and Daniel A. Menascé

GMU-CS-TR-2014-2

Large enterprises use clusters of computers with varying computing power to process workloads that are heterogenous in terms of the type of jobs and the nature of their arrival processes. The scheduling of jobs from a workload has a significant impact on their execution times. This paper presents a trace-driven analytic model (TDAM) method that can be used to assess the impact of different schedulers on job execution times. The TDAM approach uses an implementation of the scheduler to generate job traces that are fed into analytic models of the computers in the cluster. These analytic models use closed queuing network methods to estimate congestion at the various nodes of the cluster. The paper demonstrates the usefulness of the TDAM method by showing how four different types of schedulers affect the execution times of jobs derived from well-known benchmarks. The paper also demonstrates how the method can be applied to computer clusters such as the ones used to run MapReduce jobs.

Epochs: Trace-Driven Analytical Modeling of Job Execution Times

Daniel A. Menascé and Shouvik Bhardan

GMU-CS-TR-2014-1

Queuing theory has extensively studied the problem of estimating job execution times in steady state conditions both in the case of single queues and queuing networks. This paper discusses the use of closed Queuing Network (QN) models during finite time intervals to estimate the execution time of jobs submitted to a computer system. More specifically, the paper presents the