2003 Technical Reports | George Mason Department of Computer Science

Specifying Coherent Refactoring of Software Artefacts with Distributed Graph Transformations

Paolo Bottoni, Francesco Parisi-Presicce, Gabriele Taentzer

ISE-TR-03-11

Refactoring is an important source of software transformation, which changes the internal structure of a software system, while preserving its behavior. Even though the input/output view of a system's behavior does not change, refactoring can have several consequences for the computing process, as expressed for instance by the sequence of method calls or by state changes of an object or an activity. Such modifications must be reflected in the system model, generally expressed through UML diagrams. We propose a formal approach, based on distributed graph transformation, to the coordinated evolution of code and model, as effect of refactorings. The approach can be integrated into existing refactoring tools. Due to its formal base, it makes it possible to reason about the behavior preservation of each specified refactoring.

UML Specification of Access Control Policies and their Formal Verification

Manuel Koch and Francesco Parisi-Presicce

ISE-TR-03-10

Security requirements have become an integral part of most modern software systems. In order to produce secure systems, it is necessary to provide software engineers with the appropriate systematic support. We propose a methodology to integrate the specification of access control policies into UML and provide a graph-based formal semantics for the UML access control specification which permits to reason about the coherence of the access control specification. The main concepts in the UML access control specification are illustrated with an example access control model for distributed object systems.

Why Is this User Asking so Many Questions? Explaining Sequences of Queries

Aybar C. Acar and Amihai Motro

ISE-TR-03-09

A sequence of queries submitted by a database user within a short period of time may have a single, illuminating explanation. In this paper we consider sequences of single-record queries, and attempt to guess what information their authors may be trying to accumulate. Query sequences may reflect clandestine intentions, where users attempt to avoid direct queries which may disclose their true interests, preferring instead to obtain the same information by means of sequences of smaller, less conspicuous, queries. Sequences of queries may also reflect attempts to circumvent retrieval restrictions, where users attempt to approximate information which is inaccessible, with sequences of legitimate requests (in the latter case, our explanations may lead database owners to either tighten access, or, conversely, to reorganize their interfaces to facilitate access). Because the true objective of a sequence may be clouded by the retrieval of spurious records, our approach considers all the possible aggregates that a user may accumulate with a sequence, and to rank them, search-engine style, according to their plausibility as retrieval objectives. Our method is probabilistic in nature and postulates that the likelihood that a set of records is the true objective of the user is inverse proportional to the likelihood that this set results from random selection. Our method is shown to have good performance even in the presence of noise (spurious records) as high as 40-50%.

Providing Voice Privacy as a Service over the Public Telephone Network

Mohamad Sharif and Duminda Wijesekera

ISE-TR-03-08

Existing design of the public telephone network is susceptible to eavesdropping on its voice stream. Consequently, any wire taper can listen to supposedly private conversations. In order to prevent such eavesdropping, we propose an architecture and its implementation for end-to-end voice encryption as a service available for interested subscribers. Our proposal consists of appropriately placing servers to authenticate telephone equipment and subscribers of the service, and certificate authorities that can cross-certify over telecommunication service providers. We show how these entities and necessary signaling mechanisms between them can be implemented using the transaction capabilities application layer (TCAP) of the signal system seven (SS7) protocol and the D channel of the digital subscriber line (DSL) connecting telephone equipment to the SS7 grid. Using published specifications, we show that the duration to initiate an encrypted telephone call takes about 19 seconds including user authentication under average load conditions. That makes our design and protocols comparable to existing intelligent services provided over public telephones and only four times larger that initiating a normal telephone call.

Utility-based Resolution of Data Inconsistencies

Amihai Motro, Philipp Anokhin and Aybar C. Acar

ISE-TR-03-07

A virtual database system is software that provides unified access to multiple information sources. If the sources are overlapping in their contents and independently maintained, then the likelihood of inconsistent answers is high. Solutions are often based on ranking (which sorts the different answers according to recurrence) and on fusion (which synthesizes a new value from the different alternatives according to a specific formula). In this paper we argue that both methods are flawed, and we offer alternative solutions that are based on knowledge about the performance of the source data; including features such as recentness, availability, accuracy and cost. These features are combined in a flexible utility function that expresses the overall value of a data item to the user. Utility allows us to (1) define meaningful ranking on the inconsistent set of answers, and offer the top-ranked answer as a preferred answer; (2) determine whether a fusion value is indeed better than the initial values, by calculating its utility and comparing it to the utilities of the initial values; and (3) discover the best fusion: the fusion formula that optimizes the utility. The advantages of such performance-based and utility-driven ranking and fusion are considerable.

Fusionplex: Resolution of Data Inconsistencies in the Integration of Heterogeneous Information Sources

Philipp Anokhin and Amihai Motro

ISE-TR-03-06

Fusionplex is a system for integrating multiple heterogeneous and autonomous information sources, that uses data fusion to resolve factual inconsistencies among the individual sources. To accomplish this, the system relies on source features, which are meta-data on the quality of each information source; for example, the recentness of the data, its accuracy, its availability, or its cost. The fusion process is controlled with several parameters: (1) With a vector of feature weights, each user defines an individual notion of data utility; (2) with thresholds of acceptance, users ensure minimal performance of their data, excluding from the fusion process data that are too old, too costly, or lacking in authority, or data that are too high, too low, or obvious outliers; and, ultimately, (3) in naming a particular fusion function to be used for each attribute (for example, average, maximum, or simply any) users implement their own interpretation of fusion. Several simple extensions to SQL are all that is needed to allow users to state these resolution parameters, thus ensuring that the system is easy to use. Altogether, Fusionplex provides its users with powerful and flexible, yet simple, control over the fusion process. In addition, Fusionplex supports other critical integration requirements, such as information source heterogeneity, dynamic evolution of the information environment, quick ad-hoc integration, and intermittent source availability. The methods described in this paper were implemented in a prototype system that provides complete Web-based integration services for remote clients.

OLAP Means On-Line Anti-Privacy

Lingyu Wang, Duminda Wijesekera and Sushil Jajodia

ISE-TR-03-05

In this paper we investigate the privacy breaches caused by multi-dimensional range sum queries in OLAP (Online Analytic Processing) systems. Our results show that the sensitive information stored in the underlying data warehouses can be easily compromised by malicious users with legitimate range queries. This compromise is possible even when users are restricted to some special class of range queries. We present algorithms that compromise individual tuples with only range queries. We study the conditions under which the compromises are possible. We also analyze the number of queries required for each compromise, as well as the complexity and completeness of those algorithms. Our study reveals the seriousness of the privacy issue in OLAP systems and provide better understanding of the problem for further research on the control methods.

FlexFlow: A Flexible Flow Control Policy Specification Framework

Shiping Chen, Duminda Wijesekera and Sushil Jajodia

ISE-TR-03-04

Flow control policies are important in data-flow, work-flow, transaction systems and software design. Previous work in this area concentrates either on modelling security aspects of information flow control or applying flow control policies in some specific application domain. These models permit either permissions or prohibitions for flows and normally are based on a specific meta-policy (usually the closed policy). We propose FlexFlow, a logic based flexible flow control framework to specify data-flow, work-flow and transaction systems policies that go beyond point-to-point flows. Both permissions and prohibitions are specifiable in FlexFlow and meta-policies such as permissions take precedence themselves can be specified over the meta-policy neutral policy specification environment of FlexFlow. We further show how to specify and prevent inter-flow conflicts such as those arising in role-based work-flow policies.

Precisely Answering Multi-dimensional Range Queries Without Privacy Breaches

Lingyu Wang, Yingjiu Li, Duminda Wijesekera, and Sushil Jajodia

ISE-TR-03-03

This paper investigates the privacy breaches caused by multi-dimensional range (MDR) sum queries in OLAP systems. We show that existing inference control methods are generally ineffective or infeasible for MDR queries. We then consider restricting users to even MDR queries (that is, the MDR queries involving even number of data values). We show that the collection of such even MDR queries is safe if and only if a special set of sum-two queries (that is, queries involving exactly two values) is safe. On the basis of this result, we give an efficient method to decide the safety of even MDR queries. Besides safe even MDR queries we show that any odd MDR query is unsafe. Moreover, any such odd MDR query is different from the union of some even MDR queries by only one tuple. We also extend those results to the safe subsets of unsafe even MDR queries.

Adding Reliable and Self-Healing Key Distribution to the Subset Difference Group Rekeying Method for Secure Multicast

Sencun Zhu, Sanjeev Setia, and Sushil Jajodia

ISE-TR-03-02

The Subset Difference Rekeying (SDR) method is the most efficient stateless group rekeying method proposed in the literature. We study two important issues related to the SDR method. First, we address the issue of reliable rekey transport for SDR. We present a key distribution scheme, called FEC-BKR, that enables members to receive the current group key in a reliable and timely fashion despite packet losses in the network. Through simulation, we show that in most scenarios, FEC-BKR outperforms previously proposed schemes for reliable rekey transport. Second, we address the issue of self-healing key distribution for SDR. We present a group key recovery scheme that adds the self-healing property to SDR, i.e., our scheme enables a member that has missed up to a certain number m of previous rekey operations to recover the missing group keys without asking the key server for retransmission. The additional communication overhead imposed by our key recovery scheme is quite small (less than 3m additional keys).

Establishing Pair-wise Keys For Secure Communication in Ad Hoc Networks: A Probabilistic Approach

Sencun Zhu, Shouhuai Xu, Sanjeev Setia, and Sushil Jajodia

ISE-TR-03-01

We present a scalable distributed protocol that enables two mobile nodes in an ad hoc network to establish a pairwise shared key on the fly, without requiring the use of a on-line key distribution center. The design of our protocol is based on a novel combination of two techniques -- probabilistic key sharing and threshold secret sharing. Our protocol is scalable since every node only needs to possess a small number of keys, independent of the network size, and it is computationally efficient because it only relies on symmetric key operations. We show that a pairwise key established between two nodes using our protocol is secure against a collusive attack by a certain number of compromised nodes, and that our protocol can be parameterized to meet the appropriate levels of performance, security and storage for the application under consideration.

Cooperative Multi-Agent Learning: The State of the Art

Liviu Panait and Sean Luke

GMU-CS-TR-2003-1

Cooperative multi-agent systems problems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize their utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to such problems has spawned increasing interest in machine learning (ML) techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss two important topics independent of these categories: problem decomposition and communication. We conclude with a presentation of multi-agent learning problem domains, a discussion of certain challenge topics (scalability and adaptive dynamics), and a list of multi-agent learning resources.

How to Make a Technical Report

Sean Luke and Jana Kosecka

GMU-CS-TR-2003-0

Additional Files: techreports/GMU-CS-TR-2003-0.tgz

The following document describes the preferred formatting of technical reports and submission guidelines. This document itself is formatted according to the preferred formatting rules.