Editorial:
Software testing is an elephant

Published in volume 18, issue 4, December 2008

This issue contains two papers. The first, An analysis technique to increase testability of object-oriented components, by Kansomkeat and Rivepiboon, examines the problem of testability when the source is not available but the bytecode is. OO components have low testability because information hiding obscures state that needs to be controlled and monitored during testing. This research uses a clever idea of extracting a control and data flow graph, which is then used to increase both controllability and observability, making it easier to detect faults.

The second paper, The determination of optimal software release times at different confidence levels with consideration of learning effects, by Ho, Fang and Huang, uses stochastic differential equations to build a software reliability model. This model is validated on data that were published in six previous papers. The results will help project managers decide when to release software to maximize its reliability.

We probably have all heard the parable about the blind men and the elephant. Each blind man touches a part of the elephant and describes it differently. The men get into an argument, which often leads to physical violence, and they resolve the conflict with outside help. In some versions the conflict is never resolved.

Last spring I attended a workshop on software system testing and was lucky enough to find a wise person who helped dispel some of my blindness by understanding several types of test activities.

Most of my research has focused on using test criteria to help design software tests. Test design is amenable to objective, quantitative assessments of the tests, as well as automatic generation of test values, my first research love. Criteria-based test design requires knowledge of mathematics and of programming—it is an engineering approach. An equally important way to generate tests is from human intuition. Objective test criteria can overlook special situations that smart people with knowledge of the domain, testing, and user interfaces will easily see. Both approaches are intellectually stimulating, rewarding and challenging, but they appeal to people with different backgrounds. These two approaches are also complementary and, in most projects, equally helpful to create high quality software.

For efficient and effective testing, especially as software evolves through hundreds or thousands of versions, we also need to automate our tests. Test automation involves programming that is usually relatively straightforward, involving scripting languages, frameworks like JUnit, or capture/replay tools. Test automation requires little knowledge of theory, of algorithms, or about the domain, but test scripts must often solve tricky problems with controllability.

Many companies focus heavily on test execution. If we combine test design with test execution and do not use automation, test execution is very hard. This approach is also inefficient and usually ineffective. This is like expecting programmers to take software requirements and immediately starting to code without designing or outlining the algorithms. Especially when not fully automated, proper test execution requires very meticulous care and attention to detail.

A major challenge in testing is often deciding whether the output was correct or not. Test evaluation is more difficult than most researchers think and often requires deep domain knowledge and the ability to solve problems of observability. Test evaluation is logical and similar to scientific experimentation, so backgrounds in fields like law, psychology, philosophy and math can be very helpful.

These four different engineering activities—test design, automation, execution and evaluation—are essential to testing. But they do not cover all aspects of testing software. Any process requires management to set policy, organize a team and communicate with external groups. Tests also must be well documented and put into revision control. This requires input from all four prime activities as we need to know why each test was designed and how it relates to other software products. The documentation must be included as part of the automated test, leading us to maintenance of our tests. Tests must be reused as software evolves, allowing the cost of creating a test to be amortized over hundreds or thousands of versions of the software. A related, and very difficult problem, is trimming the test suite. As the number of tests grows, the suite can become redundant and too large for efficient execution.

Most research focuses on criteria-based test design. Even when other aspects of testing are addressed, the papers often put these issues in the context of test criteria. A couple of years ago I sat on an NSF panel reviewing software testing proposals. As the assessments progressed, it became apparent that one participant was on a mission to block all criteria-based research. A direct quote was "Only fools say the human should not be involved in testing." Oddly, nobody ever said humans should not be involved! At least, nobody on the panel said that; neither did any of the proposals. That is, the panelist was raising a false conflict.

Why was this conflict false, and more importantly, why was it raised? As a field, we often view testing as one monolithic activity and do not see the disparate pieces. Recognizing them allow us to understand and respect the activities of different researchers, consultants, authors and practitioners. It also allows managers to better organize their teams; for example if engineers who are good at criteria-based test design are asked to do test automation or evaluation, they will usually start looking for development jobs. Treating testing as one monolithic activity:

Many years ago, software engineering grew out of this attitude with respect to development. Software development teams have personnel who specialize in requirements, architecture, design, integration, and coding, among others. It is past time we applied similar specialization in testing. This will allow our software testing activities to be more effective and efficient, as well as helping managers organize, train, and retain testers. The first step is the hardest. We must open our eyes, see the entire elephant in its full glory, and quit thinking that our piece is the only piece.

Jeff Offutt
offutt@gmu.edu
19 October 2008