ISE-TR-07-07
We introduce the problem of query consolidation, which seeks
to interpret a set of disparate queries submitted to independent databases with a single "global" query. This problem has multiple applications, from improving database design to protecting information from a
seemingly innocuous set of apparently unrelated queries. The problem
exhibits attractive duality with the much-researched problem of query
decomposition, which has been addressed intensively in the context of
multidatabase environments: How to decompose a query submitted to a
virtual database into a set of local queries that are evaluated in individual databases. We set the new problem in the architecture of a canonical
multidatabase system, using it in the "reverse direction". The process incorporates two steps where multiplicity of solutions must be considered:
At one point the system must infer the most likely set of equi-joins for a
set of relations; at another point it must discover the most likely selection
constraints that would be applied to a relation. In each case we develop
a procedure that ranks solutions according to their perceived likelihood.
The final result is therefore a ranked list of suggested consolidations.
ISE-TR-07-06
Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed
nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus
across multiple clustering results, while averaging out emergent spurious structures that arise
due to the various biases to which each participating algorithm is tuned. In this paper, we
address the problem of combining multiple weighted clusters which belong to different subspaces
of the input space. We leverage the diversity of the input clusterings in order to generate a
consensus partition that is superior to the participating ones. Since we are dealing with weighted
clusters, our consensus functions make use of the weight vectors associated with the clusters. We
demostrate the effectiveness of our techniques by running experiments with several real datasets,
including high dimensional text data. Furthermore, we investigate in depth the issue of diversity
and accuracy for our ensemble methods. Our analysis and experimental results show that the
proposed techniques are capable of producing a partition that is as good as or better than the
best individual clustering.
Categories and Subject Descriptors: ... [...]: ...
General Terms: Clustering, cluster ensembles, subspace clustering, consensus functions, accuracy
and diversity measures, text data.
ISE-TR-07-05
Clustering is the problem of grouping objects on the basis of a similarity measure between
them. This paper considers the approaches belonging to the K-means family, in particular those
based on fuzzy memberships. When patterns are represented by means of non-metric pairwise dissimilarities, these methods cannot be directly applied, since they are not guaranteed to converge.
Symmetrization and shift operations have been proposed, to transform the dissimilarities between
patterns from non-metric to metric. It has been shown that they modify the K-means objective
function by a constant, that does not influence the optimization procedure. Some fuzzy clustering algorithms have been extended, in order to handle patterns described by means of pairwise
dissimilarities. The literature, however, lacks of an explicit analysis on what happens to K-means
style fuzzy clustering algorithms, when the dissimilarities are transformed to let them become
metric. This paper shows how the objective functions of four clustering algorithms based on fuzzy
memberships change, due to dissimilarities transformations. The experimental analysis conducted
on a synthetic and a real data set shows the effect of the dissimilarities transformations for four
clustering algorithms based on fuzzy memberships.
ISE-TR-07-04
A grid virtual enterprise is a community of independent enterprises
concerned with a particular sector of the economy. Its members (nodes) are small
or medium size enterprises (SME) engaged in bilateral transactions. An important
principle of a grid virtual enterprise is the lack of any global "guiding force", with
each member of the community making its own independent decisions. In this paper
we describe Grid-VirtuE, a three-layer architecture for grid virtual enterprises. The
top layer of the architecture, representing its ultimate purpose, is an environment
in which grid virtual enterprises can be modeled and implemented. This layer is
supported by middleware infrastructure for grids, providing a host of grid services,
such as node-to-node communication, bilateral transactions, and data collection.
The bottom layer is essentially a distributed data warehouse for storing, sharing and
analyzing the large amounts of data generated by the grid. Among other functionalities, the warehouse handles the dissemination of data among the members of the
grid; it confronts issues of data magnitude with an aging mechanism that aggregates
old data at a lower level of detail; and it incorporates privacy-preserving features
that retain the confidentiality of individual members. Warehouse information is also
used for data and process mining, aimed at analyzing the behavior of the enterprise,
and subsequently inducing evolutionary changes that will improve its performance.
ISE-TR-07-03
The extensible access control markup language (XACML) is the standard access control policy specification language of the World Wide Web. XACML does
not provide exclusive accesses to globally resources, and we do so by enhancing
the policy execution framework with locking.
ISE-TR-07-02
One of the most attractive aspects of virtual databases is their agility: the
inherent ability to adapt and evolve in response to changing market conditions.
Evolving VirtuE is a formal framework within which such agility can be realized. Through the concepts of enterprise time, activity logging, and log mining,
the recent behavior and performance of an enterprise may be studied, and corresponding evolutionary steps can be induced. These steps may be intended
to benefit the operation of individual enterprise members, as well the enterprise as a whole. In addition, we also examine enterprise creation, a period of
rapid evolution that concludes when the enterprise reaches stability and begins
transacting its business activities.
ISE-TR-07-01
Maintaining a clear separation of concerns
throughout the software lifecycle has long been a goal
of the software community. Concerns that are
separated, however, must be composed at some point.
This paper presents a technique for keeping statedependent use cases separate throughout the software
modeling process and a method for composing statedependent use cases. The composition method is based
on the graph transformations formalism. This provides
a rich yet user-friendly way of composing state
dependent use cases that is built on solid foundations.
To evaluate our approach, it has been applied to seven
student design solutions. Each solution was originally
developed using a use case-driven methodology and
was reengineered to evaluate whether our technique
could have been applied. The findings are that it is
possible to maintain the separation of state-dependent
use cases during software design and, furthermore,
that expressive model composition methods are
necessary to do this in practice.
GMU-CS-TR-2007-1
The present report details a model for implementing diffusion in a discrete space. Formulated in 2004 in support
of artificial development experiments, the model is mathematically justified starting from the diffusion equation.
It has physical plausibility and handles well different
shaped and sized diffusion neighborhoods in the sense
that it achieves isotropic diffusion in spite of the bias introduced by the discretizing grid. It provides one formulation that encapsulates the diffusion neighborhood details and renders it applicable to linear, planar, spatial and
even n-dimensional constructs.