Nothing ever becomes real till it
is experienced; even a proverb is no
proverb to you till your life has illustrated it.
— John Keats
Syllabus Schedule Papers Project My home page
 
Software Engineering Experimentation
Project Description
Spring 2012
 

Seventh GMU Workshop on Experimental Software Engineering

Program Chair: Jeff Offutt

Technical Program Committee: SEE students

The GMU Workshop on Experimental Software Engineering provides a forum for discussing current experimental studies in the field of software engineering. Papers are solicited for the studies listed in this CFP, as well as for other studies.

Accepted papers will not be published in any conference proceedings. Submitted papers must not have been published previously, but they may be submitted elsewhere in the future. All submitted papers will be accepted.

Full-Length Papers: Papers should be submitted 1.5 or double-spaced in a font size no smaller than 11 points, fully justified. Papers must not exceed 25 double-spaced pages including references and figures, and will not be refereed by external reviewers. All papers should indicate what is interesting about the presented work. The first page should include an abstract of maximum 150 words, a list of keywords, the author's name, affiliation, and contact information (email address and phone). The citations and references should be formatted in standard software engineering format, that is, with bracketed citations ("[1]") and citation keys that are either numeric or strings based on the authors' names ("[Basi91]").

Presentations: You will be allowed 25 minutes for your presentation, including 5 minutes for questions.

Submission Procedure: A first draft of each paper must be submitted before 16 April by posting on the Piazza bulletin board. Each paper will receive at least three reviews, one from the program chair and two from technical program committee members. Reviews will be returned on 23 April, and the final paper must be submitted electronically by 30 April. Final papers must be submitted in PDF format (not MS Word or Latex!). The final paper must be single spaced and in 10 point font.

Milestones Date
Topic selection: 30 January
Experimental design review: 20 February
Draft paper submitted: 16 April
Reviews due: 23 April
Final paper submitted: 30 April
Presentations: See schedule


Topics

Don't mind criticism --
   If it is untrue, disregard it,
   If it is unfair, don't let it irritate you,
   If it is ignorant, smile,
   If it is justified, learn from it.
     - Anonymous

SUGGESTED TOPICS LIST

Following is a list of suggested topics for your empirical study. You may choose any topic you wish, either from this list or something of your own creation. I specifically encourage you to consider carrying out an experiment related to your current research. Many of these suggestions are related to software testing. This emphatically does not imply a preference in the class, but just reflects the limits to my own creativity. That is, most of my ideas are about testing problems.

You will notice that most of these studies do not involve much if any programming but some will involve a lot of program execution. Also, these studies can be done more easily with clever use of shell scripts. There can be a fair amount of overlap between these studies, and you may want to share programs, test data sets, or other artifacts. Trading of this kind of experimental artifacts is greatly encouraged!

Some of these studies could use a partner to carry out some of the work, so as to avoid bias from having one person conduct the entire experiment. I encourage you to help each other; please communicate among yourselves if you need help ... ask and offer.

These descriptions are concise overviews and most are fairly open-ended, by design, to encourage more creativity and divergent thinking. I will be happy to discuss any project in more depth if you need help refining the suggestion.

Empirical Studies Suggestions

  1. New 21 January How are mutation tests different from human-designed tests?: While researchers have evaluated the quality of human-designed tests by measuring them against mutation, noone has asked whether human-designed tests tend to miss particular types of mutants. Unkilled mutants may reveal types of faults that humans tend to miss.
  2. Does code coverage map downards?: Most test criteria are measured by instrumenting the program to count how many times the tests "hit" the element we need to cover. For example, edge coverage (branch coverage) is trivially measured by placing a counter on each branch in a program. Data flow, edge-pair and prime path coverage are only slightly more complicated. The definitions and most of our knowledge about these criteria are based on covering the source code, however some tools (such as Emma and EclEmma) instrument bytecode. An important question, then, is whether achieving branch coverage (or any other coverage technique) at the bytecode level is the same as achieving branch coverage on the source code.
  3. How good is CodeCover?: The CodeCover tool http://codecover.org/ is an open source tool that supports various types of code coverage. However, some have voiced concerns that the decisions made to measure coverage are non-standard and could lead to questionable results. A nice study would be to experimentally verify whether the measures match the theoretical definitions as given in my textbook.
  4. Covering the model versus covering the program: If we design and generate tests to cover a model of a program, for example, a finite state maching or UML diagram, how well will those tests cover the program on the same criterion? Note that this study could be done with multiple criterion.
  5. How much does ROR help MCDC?: In a paper just submitted, Improving Logic-Based Testing, Kaminski, Ammann and I showed how to add the ROR mutation operator to MCDC testing, resulting in a stronger test set. But this technique has a cost, one more test per clause in each predicate in the program. Empirical studies are needed to determine how much this technique improves fault detection.
  6. Java mutation experiments: One resource we have available is a mutation testing system for Java, muJava. Instructions for downloading, installing, and running muJava are available on the website. There are several small experiments you could use muJava to run.
  7. Comparing input space partitioning techniques: Dozens of studies comparing structural, data flow, and mutation test criteria have been published. But I have not seen any studies that compared input space partitioning criteria such as each choice, base choice, pair-wise, and multiple base choice.
  8. Web Modeling and Testing Evaluation: I recently published a paper that proposed a method for modeling the presentation layer of web applications. This model can be used to generate tests, among other things. If you have access to a reasonably sized web application, it would be very interesting to apply the modeling and test method in the paper to evaluate its effectiveness. The paper can be downloaded from my A website.
  9. Software Engineering Factoids: We have a lot of truisms about software engineering. These are small facts, or "factoids" that "everybody knows" is true, yet the source for these factoids are lost in the mists of time. Some are based on data from the 1970s, some are based on 30 year old casual observations, and some were probably made up by speakers who wished for a fact to support some point. By now, "everybody" accepts these factoids as truth, yet they may no longer be true or may have never been true! A few example factoids are: I am sure that you can think of more. The goal of this project would be to verify one or more of the factoids. This would require three steps: (1) find the old sources for the factoid, who originated it, what the fact was based on, and where it was used; (2) verify whether the factoid is true for current systems; and (3) quantify the correct version of the factoid as best as you can from current data.
  10. Metrics Comparison: Researchers have suggested many ways to measure the complexity and/or quality of software. These software metrics are difficult to evaluate, particularly on an analytical basis. A interesting project would be to take two or more metrics, measure a number of software systems, and compare the measurements in an objective way. The difficult part of this study would be the evaluation method: How can we compare different software metrics? To come up with a sensible answer to this question, start with a deeper question: What do we want from our metrics?
  11. Frequency of Multi-Clause Predicates in Open-Source Software: Logic-based test criteria such as ACC are stronger than simpler criteria such as predicate coverage only when predicates have more than one clause. And ACC only has significant savings over all combination coverage when predicates have several clauses; possibly four or five. So an important question about the practicality and usefulness of ACC is how often predicates in real programs have several clauses. A study that counted clauses in real programs, for example, open-source programs, would help us determine how useful some of these techniques are. The number of clauses per predicate should also be compared with overall measures of size, such as lines of code, number of classes, etc.
  12. Frequency of Infeasible Subpaths in Testing: Many structural testing criteria exhibit what is called the feasible path problem, which says that some of the test requirements are infeasible in the sense that the semantics of the program imply that no test case satisfies the test requirements. Equivalent mutants, unreachable statements in path testing techniques, and infeasible DU-pairs in data flow testing are all instances of the feasible path problem. For example, in branch testing, one branch might be executed if (X = 0) and another if (X != 0); if the test requirements need both branches to be taken during the same execution, the requirement is infeasible. This study would determine, for a sample of programs, how many subpaths that are required to be executed by some test criterion are infeasible. A reference on the subject of the feasible path problem can be found on my web site: Automatically Detecting Equivalent Mutants and Infeasible Paths.
  13. Experiments in Coupling: My student, Aynur Abdurazik, recently completed her dissertation on Coupling-Based Analysis of Object-Oriented Software. In Chapter 10 she suggested several interesting areas for future research, some of them experimental. In particulate, sections 10.2.1, Application of Coupling Model to Web Applications, 10.2.2, Coupling-based Fault Analysis, 10.2.3, Comprehensive Empirical Validation of Three Specific Problems, 10.2.6, Coupling-based Reverse Engineering, and 10.2.7, Coupling-based Component Ranking suggest some potentially useful experimental directions.
  14. Inversion of Control/Dependency Injection: The Inversion of Control and Dependency Injection design patterns are becoming a common part of frameworks. These patterns provide a means to develop loosely coupled applications. Both approaches encourage interface-based development and the use of POJO (plain old java objects). It would be interesting to investigate the effects of these patterns on unit and integration testing.
  15. Declarative Programming: Many Web frameworks attempt to alleviate the burden of tedious programming tasks by allowing developers to specify navigation and page composition logic declaratively in configuration files. It would be interesting to investigate the effects of this type of declarative programming on software testing.

© Jeff Offutt, 2012, all rights reserved. This document is made available for use by students in software engineering experimentation. Copying, distribution or other use of this document without express permission of the author is forbidden. You may create links to pages in this web site, but may not copy all or part of the text without permission of the author.