Daniel Barbara and Mark Sullivan
A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains at each point an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In practical situations, cubes can require a large amount of storage. The typical approach to reducing storage cost is to materialize parts of the cube on demand. Unfortunately, this lazy evaluation can be a time-consuming operation. In this paper, we propose an approximation technique that reduces the storage cost of the cube without incurring the run time cost of lazy evaluation. The idea is to characterize regions of the cube by using statistical models whose description take less space than the data itself. Then, the model parameters can be used to estimate the cube cells with a certain level of accuracy. To increase the accuracy, some of the "outliers," i.e., cells that incur in the largest errors when estimated can be retained. The storage taken by the model parameters and the retained cells, of course, should take a fraction of the space of the full cube and the estimation procedure should be faster than computing the data from the underlying relations. We study three methods for modeling cube regions: linear regression, singular value decomposition and the independence approach. We study the tradeoff between the amount of storage used for the description and the accuracy of the estimation. Experiments show that the errors introduced by the estimation are small even when the description is substantially smaller than the full cube. Since cubes are used to support data analysis and analysts are rarely interested in the precise values of the aggregates (but rather in trends), providing approximate answers is, in most cases, a satisfactory compromise. Our technique allows systems to handle very large cubes that cannot either be fully stored or efficiently materialized on demand. Even for applications that can only accept tiny error in the cube data, our techniques can be used to present successive on-line refinements of the answer to a query so that a coarse picture of the answer can be presented while the remainder of the answer is fetched from disk (i.e., to support on-line aggregation).
Jane Hayes and Jeff Offutt
Syntax-directed software accepts inputs from the user, constructed and arranged properly, that control the execution of the application. Input validation testing chooses test data that attempt to show the presence or absence of specific faults pertaining to input tolerance. A large amount of syntax-directed software currently exists and will continue to be developed that should be subjected to input validation testing. System level testing techniques that currently address this area are not well developed or formalized. There is a lack of system level testing formal research and accordingly a lack of formal, standard criteria, general purpose techniques, and tools. Systems are large (size and domain) so unit testing techniques have had limited applicability. Input validation testing techniques have not been developed or automated to assist in static input syntax evaluation and test case generation. This paper addresses the problem of statically analyzing input command syntax as defined in interface and requirements specifications and then generating test cases for input validation testing. The IVT (Input Validation Testing) technique has been developed, a proof-of-concept tool (MICASA) has been implemented, and validation has been performed. Empirical validation on actual industrial software (for the Tomahawk Cruise Missile) shows that as compared with senior, experienced testers, MICASA found more requirement specification defects, generated test cases with higher syntactic coverage, and found additional defects. Additionally, the tool performed at significantly less cost.
Jeff Offutt
This report presents results for the Rockwell Collins Inc. sponsored project on generating test data from requirements/specifications, which started May 19, 1997. The purpose of this project is to improve our ability to test software that needs to be highly reliable by developing formal techniques for generating test cases from formal specification al descriptions of the software. Formal specifications represent a significant opportunity for testing because they precisely describe what functions the software is supposed to provide in a form that can be easily manipulated by automated means. This report presents a general model for developing test inputs from state-based specifications, a derivation process for obtaining the test cases, a fully worked out example for a small system, and test cases from a specifications of an industrial system. The test data generation model includes techniques for generating tests at several levels of abstraction for specifications, including the complete transition sequence level, the transition-pair level, and the detailed transition level. These techniques provide coverage criteria that are based on the specifications, and are made up of several parts, including test prefixes that contain inputs necessary to put the software into the appropriate state for the test values. The test generation process includes several steps for transforming specifications to tests.