Aybar C. Acar and Amihai Motro
ISE-TR-07-07
We introduce the problem of query consolidation, which seeks to interpret a set of disparate queries submitted to independent databases with a single "global" query. This problem has multiple applications, from improving database design to protecting information from a seemingly innocuous set of apparently unrelated queries. The problem exhibits attractive duality with the much-researched problem of query decomposition, which has been addressed intensively in the context of multidatabase environments: How to decompose a query submitted to a virtual database into a set of local queries that are evaluated in individual databases. We set the new problem in the architecture of a canonical multidatabase system, using it in the "reverse direction". The process incorporates two steps where multiplicity of solutions must be considered: At one point the system must infer the most likely set of equi-joins for a set of relations; at another point it must discover the most likely selection constraints that would be applied to a relation. In each case we develop a procedure that ranks solutions according to their perceived likelihood. The final result is therefore a ranked list of suggested consolidations.
Carlotta Domeniconi and Muna Al-Razagan
ISE-TR-07-06
Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this paper, we address the problem of combining multiple weighted clusters which belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demostrate the effectiveness of our techniques by running experiments with several real datasets, including high dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering. Categories and Subject Descriptors: ... [...]: ... General Terms: Clustering, cluster ensembles, subspace clustering, consensus functions, accuracy and diversity measures, text data.
Maurizio Filippone
ISE-TR-07-05
Clustering is the problem of grouping objects on the basis of a similarity measure between them. This paper considers the approaches belonging to the K-means family, in particular those based on fuzzy memberships. When patterns are represented by means of non-metric pairwise dissimilarities, these methods cannot be directly applied, since they are not guaranteed to converge. Symmetrization and shift operations have been proposed, to transform the dissimilarities between patterns from non-metric to metric. It has been shown that they modify the K-means objective function by a constant, that does not influence the optimization procedure. Some fuzzy clustering algorithms have been extended, in order to handle patterns described by means of pairwise dissimilarities. The literature, however, lacks of an explicit analysis on what happens to K-means style fuzzy clustering algorithms, when the dissimilarities are transformed to let them become metric. This paper shows how the objective functions of four clustering algorithms based on fuzzy memberships change, due to dissimilarities transformations. The experimental analysis conducted on a synthetic and a real data set shows the effect of the dissimilarities transformations for four clustering algorithms based on fuzzy memberships.
Alfredo Cuzzocrea, Alessandro D'Atri, Andrea Gualtieri, Amihai Motro, and Domenico Sacc
ISE-TR-07-04
A grid virtual enterprise is a community of independent enterprises concerned with a particular sector of the economy. Its members (nodes) are small or medium size enterprises (SME) engaged in bilateral transactions. An important principle of a grid virtual enterprise is the lack of any global "guiding force", with each member of the community making its own independent decisions. In this paper we describe Grid-VirtuE, a three-layer architecture for grid virtual enterprises. The top layer of the architecture, representing its ultimate purpose, is an environment in which grid virtual enterprises can be modeled and implemented. This layer is supported by middleware infrastructure for grids, providing a host of grid services, such as node-to-node communication, bilateral transactions, and data collection. The bottom layer is essentially a distributed data warehouse for storing, sharing and analyzing the large amounts of data generated by the grid. Among other functionalities, the warehouse handles the dissemination of data among the members of the grid; it confronts issues of data magnitude with an aging mechanism that aggregates old data at a lower level of detail; and it incorporates privacy-preserving features that retain the confidentiality of individual members. Warehouse information is also used for data and process mining, aimed at analyzing the behavior of the enterprise, and subsequently inducing evolutionary changes that will improve its performance.
Vijayant Dhankhar, Saket Kaushik, and Duminda Wijesekera
ISE-TR-07-03
The extensible access control markup language (XACML) is the standard access control policy specification language of the World Wide Web. XACML does not provide exclusive accesses to globally resources, and we do so by enhancing the policy execution framework with locking.
Alessandro D'Atri, Amihai Motro
ISE-TR-07-02
One of the most attractive aspects of virtual databases is their agility: the inherent ability to adapt and evolve in response to changing market conditions. Evolving VirtuE is a formal framework within which such agility can be realized. Through the concepts of enterprise time, activity logging, and log mining, the recent behavior and performance of an enterprise may be studied, and corresponding evolutionary steps can be induced. These steps may be intended to benefit the operation of individual enterprise members, as well the enterprise as a whole. In addition, we also examine enterprise creation, a period of rapid evolution that concludes when the enterprise reaches stability and begins transacting its business activities.
Jon Whittle, Ana Moreira, Joao Araujojo, and Rasheed Rabbi
ISE-TR-07-01
Maintaining a clear separation of concerns throughout the software lifecycle has long been a goal of the software community. Concerns that are separated, however, must be composed at some point. This paper presents a technique for keeping statedependent use cases separate throughout the software modeling process and a method for composing statedependent use cases. The composition method is based on the graph transformations formalism. This provides a rich yet user-friendly way of composing state dependent use cases that is built on solid foundations. To evaluate our approach, it has been applied to seven student design solutions. Each solution was originally developed using a use case-driven methodology and was reengineered to evaluate whether our technique could have been applied. The findings are that it is possible to maintain the separation of state-dependent use cases during software design and, furthermore, that expressive model composition methods are necessary to do this in practice.
Adrian Grajdeanu
GMU-CS-TR-2007-1
The present report details a model for implementing diffusion in a discrete space. Formulated in 2004 in support of artificial development experiments, the model is mathematically justified starting from the diffusion equation. It has physical plausibility and handles well different shaped and sized diffusion neighborhoods in the sense that it achieves isotropic diffusion in spite of the bias introduced by the discretizing grid. It provides one formulation that encapsulates the diffusion neighborhood details and renders it applicable to linear, planar, spatial and even n-dimensional constructs.