Aynur Abdurazik and Jeff Offutt
ISE-TR-99-09
Unified Modeling Language (UML) is a third generation modeling language in object-oriented software engineering. It provides constructs to specify, construct, visualize, and document artifacts of software-intensive systems. This paper presents a technique that uses Offutt's state-based specification test data generation model to generate test cases from UML statecharts. A tool TCGen has been developed to demonstrate this technique with specifications written in Software Cost Reduction (SCR) method and Unified Modeling Language (UML). Experimental results from using this tool are presented.
Daniel Barbara and Ping Chen
ISE-TR-99-08
Clustering is a widely used knowledge discovery technique. It helps uncovering structures in data that were not previously known. Clustering of large datasets has received a lot of attention in recent years. However, clustering is a still a challenging task since many published algorithms fail to do well in scaling with the size of the dataset and the number of dimensions that describe the points, or in finding arbitrary shapes of clusters, or dealing effectively with the presence of noise. In this paper, we present a new clustering algorithm, based in the fractal properties of the datasets. The new algorithm, which we call Fractal Clustering (FC) places points incrementally in the cluster for which the change in the fractal dimension after adding the point is the least. This is a very natural way of clustering points, since points in the same cluster have a great degree of self-similarity among them (and much less self-similarity with respect to points in other clusters). FC requires one scan of the data, is suspendable at will, providing the best answer possible at that point, and is incremental. We show via experiments that FC effectively deals with large datasets, high-dimensionality and noise and is capable of recognizing clusters of arbitrary shape.
Philipp Anokhin and Amihai Motro
ISE-TR-99-07
With the development of the Internet and the on-line availability of large numbers of information sources, the problem of integrating multiple heterogeneous information sources requires reexamination, its basic underlying assumptions having changed drastically. Integration methodologies must now contend with situations in which the number of potential data sources is very large, and the set of sources changes continuously. In addition, the ability to create quick, ad-hoc virtual databases for short-lived applications is now considered attractive. Under these new assumptions, a single, complete answer can no longer be guaranteed. It is now possible that a query could not be answered in its entirety, or it might result in several different answers. Multiplex is a multidatabase system designed to operate under these new assumptions. In this paper we describe how Multiplex handles queries that do not have a single, complete answer. The general approach is to define flexible and comprehensive strategies that direct the behavior of the query processing subsystem. These strategies may be defined either as part of the multidatabase design or as part of ad-hoc queries.
Michelle L. Lee
ISE-TR-99-06
As the software industry has matured, we have shifted our resources from being devoted todeveloping new software systems to making modifications in evolving software systems. A major problem for developers in an evolutionary environment is that seemingly small changes can ripple throughout the system to cause major unintended impacts elsewhere. As such, software developers need mechanisms to understand how a change to a software system will impact the rest of the system. Although the effects of changes in object-oriented software can be restricted, they are also more subtle and more difficult to detect. Maintaining the current object-oriented systems is more of an art, similar to where we were 15 years ago with procedural systems, than an engineering skill. We are beginning to see "legacy" object-oriented systems in industry. A difficult problem is how to maintain these objects in large, complex systems. Although objects are more easily identified and packaged, features such as encapsulation, inheritance, aggregation, polymorphism and dynamic binding can make the ripple effects of object-oriented systems far more difficult to control than in procedural systems. The research presented here addresses the problems of change impact analysis for object-oriented software. Major results of this research include a set of object-oriented data dependency graphs, a set of algorithms that allow software developers to evaluate proposed changes on object-oriented software, a set of object-oriented change impact metrics to evaluate the change impact quantitatively, and a prototype tool, ChaT, to evaluate the algorithms. This research also results in efficient regression testing by helping testers decide what classes and methods need to be retested, and in supporting cost estimation and schedule planning.
Roger T. Alexander
ISE-TR-99-05
As we move from developing procedure-oriented to object-oriented programs, the complexity traditionally found in functions and procedures is moving to the connections among components. More faults occur as components are integrated to form higher level aggregates of behavior and state. Consequently, we need to place more effort on testing the connections among components. Although object-oriented technology provides abstraction mechanisms to build components to integrate, it also adds new compositional relations that can contain faults, which must be found during integration testing. This paper describes new techniques for analyzing and testing the polymorphic relationships that occur in object-oriented software. The application of these techniques can result in an increased ability to find faults and overall higher quality software.
Zoran Duric, Neil F. Johnson, Sushil Jajodia
ISE-TR-99-04
Many techniques for watermarking of digital images have appeared recently. Most of these techniques are sensitive to cropping and/or affine distortions (e.g., rotations and scaling). In this paper we describe a method for recognizing images based on the concept of identification mark; the method does not require the use of the original image, it requires only a small number of salient image points. We show that, using our method, it is possible to recognize distorted images and recover their "original" appearance. Once the image is recognized we use a second technique based on the normal flow to fine-tune image parameters. The restored image can be used to recover the watermark that had been embedded in the image by its owner.
Daniel Barbara
ISE-TR-99-03
A lot of real datasets behave in a fractal fashion, i.e., they show an invariance with respect to the scale used to look at them. Fractal sets are characterized by a family of fractal dimensions, each with a particular interpretation. In this paper we show examples of how the fractal dimension(s) can be used to extract knowledge from datasets. Techniques that use the fractal dimension to detect anomalies in time series, to characterize time patterns of association rules, discover patterns in datacubes and cluster multidimensional datasets are described as part of an on-going research effort.
Daniel Barbara and Xintao Wu
ISE-TR-99-02
Exploratory Data Analysis is a widely used technique to determine which factors have the most influence on data values in a multi-way table, or which cells in the table can be considered anomalous with respect to the other cells. In particular, median polish is a simple, yet robust method to perform Exploratory Data Analysis. Median polish is resistant to holes in the table (cells that have no values), but it may require a lot of iterations through the data. This factor makes it difficult to apply to large multidimensional tables, since the I/O requirements may be prohibitive. This paper describes a technique that uses median polish over an approximation of a datacube, easing the burden of I/O. The results obtained are tested for quality, using a variety of measures. The technique scales to large datacubes and proves to give a good approximation of the results that would have been obtained by median polish in the original data.
Jeff Offutt
ISE-TR-99-01
This report presents results for the Rockwell Collins Inc. sponsored project on generating test data from requirements/specifications, which started January 1, 1998. The purpose of this project is to improve our ability to test software that needs to be highly reliable by developing formal techniques for generating test cases from formal specificational descriptions of the software. Formal specifications represent a significant opportunity for testing because they precisely describe what functions the software is supposed to provide in a form that can be easily manipulated by automated means. This Phase II, 1998 report presents results and strategies for practically applying test cases generated according to the criteria presented in the Phase I, 1997 report. This report presents a small empirical evaluation of the test criteria, and algorithms for solving various problems that arise when applying test cases developed from requirements/specifications. One significant problem in specification-based test data generation is that of reaching the proper program state necessary to execute a particular test case. Given a test case that must start in a particular state S, the test case prefix is a sequence of inputs that will put the software into state S. We have addressed this problem in two ways. First is to combine various test cases to be run in test sequences that are ordered in such a way that each test case leaves the software in the state necessary to run the subsequent test case. An algorithm is presented that attempts to find test case sequences that are optimal in the sense that the fewest possible number of test cases are used. To handle situations where it is desired to run each test case independently, an algorithm for directly deriving test sequences is presented. This report also presents procedures for removing redundant test case values, and develops the idea of "sequence-pair" testing, which was presented in the 1997 Phase I report, into a more general idea of "interaction-pair" testing.