Published in volume 25, issue 2, March 2015

This issue contains three papers that invent new ideas and evaluated them empirically. The first paper, Directed test suite augmentation: An empirical investigation, by Xu, Kim, Kim, Cohen, and Rothermel, empirically investigates the effectiveness of strategies for augmenting test sets after changes to software. They compared two different test generation algorithms in two separate studies. (Recommended by Paul Ammann.) The second paper, Reducing execution profiles: Techniques and benefits, by Farjo, Assi, and Masri, presents results of analysis of execution profiles. They invented six ways to reduce the size of execution profiles and empirically measured the effect on the quality of analysis after reducing the profiles. (Recommended by T.Y. Chen.) The third paper, Automated metamorphic testing of variability analysis tools, by Segura, Durán, Sánchez, Le Berre, Lonca, and Ruiz-Cortés, invent a technique for solving the oracle problem when testing tools that analyze the variability of software. (Recommended by T.H. Tse.) Combined, these three papers have a whopping 14 co-authors, which leads in perfectly to the subject of this editorial: determining authorship.


Who Is An Author?

In the last issue, I discussed plagiarism; what constitutes plagiarism, different types of plagiarism, why people plagiarize, and how to avoid plagiarism [1]. In that editorial, I touched upon a related but more difficult issue: authorship. The author list is very clear with most papers. Other times it is less clear. In this editorial, I start with general principles, then talk about a number of specific examples, some controversial.

Most software engineers agree on a general principle:

Everyone who makes substantial contributions to the results is a co-author on papers that present those results.

Additionally, all co-authors should see the papers and have the opportunity to participate in the writing before submission.

Note that this definition of authorship determines who should be listed, and does not depend on who is listed. That is, being listed does not make one a co-author, and not being listed does not mean one is not an co-author. Not surprisingly, the most common two types of authorship problems are guest authors and ghost authors.

A guest author is someone who is listed as an author in the paper, but who did not make “substantial contributions.” For example, suppose a student graduates, moves to a different organization, and starts his or her own research program without collaborating with the former advisor. The advisor is not an author of papers that come out of that new research, so if the student puts the advisor’s name on the papers, the advisor is a guest author. In fact, guest authorship is a form of plagiarism, since the guest author is “taking someone else’s work or ideas and passing them off as one’s own” [2].

A ghost author is just the opposite: Someone who made substantial contributions but is not listed as an author. That person is, by definition, an author, and should be included. For example, scientists A, B, and C carry out a research project and develop some results. Then A and B have an argument with C, A and B write a paper, and because they are angry at C, leave C’s name off the paper. C is, by definition, an author and omitting C’s name is an ethical violation. Again, this is a form of plagiarism because A and B are passing C’s ideas off as their own. An exception is if a co-author explicitly declines being listed as a co-author.

Given those two examples, let’s go back to the definition of authorship. You should notice that the definition has a major problem--the phrase “substantial contribution” is ambiguous and subjective. This creates a very large gray area by opening up the question: “What is a substantial contribution?” I start with a general principle, and then discuss some specific examples.

To me, it is very important that authorship must be discussed openly, objectively, and rationally. This conversation should happen before the paper is started. All parties who were connected with the project should be part of the conversation and have the opportunity to voice an opinion.

Beyond that, several questions can help determine who made enough of a substantial contribution to be an author. Was the person “in the room” when the main ideas were being discussed? Would the paper have been rejected if that person was not involved? Would the paper have been substantially different without that person? (Although a useful way to frame the question, this question also has the ambiguous word “substantial.”)

It is usually clear who made substantial contributions and controversy is rare. But some contributions are hard to categorize as substantial or not. The list below is not comprehensive, but covers some potentially gray areas from my experience where I believe that that authorship is warranted, at least under certain circumstances.

I also want to discuss some contributions I would not normally consider substantial.

I have also seen some reasons for excluding someone from the author list that I strongly disagree with. One is if the student left the program or graduated. It does not matter where the student is now, if the student made a contribution at the beginning, the student is still an author. Another was when a colleague once said “she’s only an undergraduate.” I still think that was unfair; if a 6-year old gives me an idea, that merits authorship even if she can barely write English. Not to mention that a publication can be a big deal to an undergraduate applying to graduate school.

One more principle I advocate is when in doubt, be inclusive. I would much rather include someone whose contribution is marginally “substantial” than omit someone who really believed she contributed. Having one more author won’t hurt. A difficult situation is if A and B thinks C’s contribution is not substantial, but C thinks C’s contribution is. In that situation, I recommend including C as a co-author. Again, it doesn’t hurt A and B to list another co-author, but it hurts C to be omitted. And beyond the concerns of the potential authors, mislabeling authorship also hurts the academic community.

The final point is about author order. Once deciding who the authors are, how do we order them? This is a difficult social issue as much as a technical issue. The two most common strategies are order of contribution and alphabetical order (by last, or family, name). Sometimes the order of contribution is easy to determine, but not always. So a compromise is often to list the “primary” author first, and the rest in alphabetical order. My personal rule is student first on papers from their dissertation work. Second, if relative contribution can be agreed on, we use that. If not, we go alphabetical. Again, it’s important that author order be discussed early and all authors agree.

I wish that I could provide definitive answers for these questions. I cannot paint “perfect lines,” but what I have tried to do is offer thoughtful guidance for how to think about the line between substantial and non-substantial contributions. A useful question is “can a specific substantial contribution be articulated for each person listed as author?” I hope this editorial provides enough thoughtful guidance to make good decisions about authorship.

[1] Jeff Offutt. Plagiarism Is For Losers (Editorial), Wiley's journal of Software Testing, Verification, and Reliability, 25(1), January 2015

[2] Oxford Online Dictionary, http://www.oxforddictionaries.com/, last access January 2015.

Thanks to Paul Ammann and Rob Hierons for providing helpful comments on an early draft of this editorial.

Additional resources, added October 2017:

Jeff Offutt
31 January 2015