Editorial

Published in volume 25, issue 2, March 2015

This issue contains three papers that invent new ideas and evaluated them empirically. The first paper, Directed test suite augmentation: An empirical investigation, by Xu, Kim, Kim, Cohen, and Rothermel, empirically investigates the effectiveness of strategies for augmenting test sets after changes to software. They compared two different test generation algorithms in two separate studies. (Recommended by Paul Ammann.) The second paper, Reducing execution profiles: Techniques and benefits, by Farjo, Assi, and Masri, presents results of analysis of execution profiles. They invented six ways to reduce the size of execution profiles and empirically measured the effect on the quality of analysis after reducing the profiles. (Recommended by T.Y. Chen.) The third paper, Automated metamorphic testing of variability analysis tools, by Segura, Durán, Sánchez, Le Berre, Lonca, and Ruiz-Cortés, invent a technique for solving the oracle problem when testing tools that analyze the variability of software. (Recommended by T.H. Tse.) Combined, these three papers have a whopping 14 co-authors, which leads in perfectly to the subject of this editorial: determining authorship.

Who Is An Author?

In the last issue, I discussed plagiarism; what constitutes plagiarism, different types of plagiarism, why people plagiarize, and how to avoid plagiarism [1]. In that editorial, I touched upon a related but more difficult issue: authorship. The author list is very clear with most papers. Other times it is less clear. In this editorial, I start with general principles, then talk about a number of specific examples, some controversial.

Most software engineers agree on a general principle:

Everyone who makes substantial contributions to the results is a co-author on papers that present those results.

Additionally, all co-authors should see the papers and have the opportunity to participate in the writing before submission.

Note that this definition of authorship determines who should be listed, and does not depend on who is listed. That is, being listed does not make one a co-author, and not being listed does not mean one is not an co-author. Not surprisingly, the most common two types of authorship problems are guest authors and ghost authors.

A guest author is someone who is listed as an author in the paper, but who did not make “substantial contributions.” For example, suppose a student graduates, moves to a different organization, and starts his or her own research program without collaborating with the former advisor. The advisor is not an author of papers that come out of that new research, so if the student puts the advisor’s name on the papers, the advisor is a guest author. In fact, guest authorship is a form of plagiarism, since the guest author is “taking someone else’s work or ideas and passing them off as one’s own” [2].

A ghost author is just the opposite: Someone who made substantial contributions but is not listed as an author. That person is, by definition, an author, and should be included. For example, scientists A, B, and C carry out a research project and develop some results. Then A and B have an argument with C, A and B write a paper, and because they are angry at C, leave C’s name off the paper. C is, by definition, an author and omitting C’s name is an ethical violation. Again, this is a form of plagiarism because A and B are passing C’s ideas off as their own. An exception is if a co-author explicitly declines being listed as a co-author.

Given those two examples, let’s go back to the definition of authorship. You should notice that the definition has a major problem--the phrase “substantial contribution” is ambiguous and subjective. This creates a very large gray area by opening up the question: “What is a substantial contribution?” I start with a general principle, and then discuss some specific examples.

To me, it is very important that authorship must be discussed openly, objectively, and rationally. This conversation should happen before the paper is started. All parties who were connected with the project should be part of the conversation and have the opportunity to voice an opinion.

Beyond that, several questions can help determine who made enough of a substantial contribution to be an author. Was the person “in the room” when the main ideas were being discussed? Would the paper have been rejected if that person was not involved? Would the paper have been substantially different without that person? (Although a useful way to frame the question, this question also has the ambiguous word “substantial.”)

It is usually clear who made substantial contributions and controversy is rare. But some contributions are hard to categorize as substantial or not. The list below is not comprehensive, but covers some potentially gray areas from my experience where I believe that that authorship is warranted, at least under certain circumstances.

Was the person “in the room?” If the answer is “yes,” that person is very likely to have made a substantial contribution. In a creative intellectual discussion, we often forget who said what, and it’s hard to determine whether a seemingly small, or even incorrect, observation helped someone else have a key insight. On the other hand, if someone was in the room but did not say anything, then the contribution was clearly not substantial. The co-authors of the paper should tell that person directly that they are writing a paper based on that discussion without that person’s input.
Did the person run the experiment? In my world, this constitutes “substantial contribution” and I would always expect that person to be a co-author. In most computing experiments, the value of the experiment depends largely on the abilities of the person running the experiment, who often must note interesting unexpected events and adjust accordingly. However, some scientists, particularly in traditional sciences like biology and physics, consider the person who ran the experiment to be a “technician” who was paid for a service and did not contribute intellectually to the project. Perhaps this is because little or no value can be added to these experiments in process. I am hesitant to call that wrong, but in my research, I definitely consider running the experiment to be an intellectual contribution.
Would the results have been different without this person? Even a small observation like pointing out an essential flaw in an early experimental design can have a large impact on the results. If someone offers that level of help, I would usually invite that person to co-author the paper. If he or she declines, I would at least put a nice word in the acknowledgments.
Did the person build the experimental infrastructure (the software, lab, etc.)? Again, many fields consider that a pay-for-service activity that does not justify authorship. And some infrastructure, particularly software, can be reused by hundreds of projects ... it’s hard to imagine the programmer being listed as a co-author on all those papers. However, if the infrastructure was built for a specific purpose, the engineer who builds it will often have, and implement, ideas that can dramatically improve the experimental study. Thus, I consider the programmer to be an author of papers that came out of that work.
Did the person provide editorial services? This is another gray area where the answer can go either way. Most people would agree that if someone reads the paper and makes a few presentational suggestions, that does not constitute authorship. But what if the original paper is impossible to understand because of language problems, or some aspect of the presentation is so bad that it could not be accepted? If someone steps in and does a major rewrite or writes a major reorganization of the paper, my opinion is that can be considered a substantial contribution. Here it’s the amount of change that matters. Most importantly, both parties should discuss authorship openly and both should agree on who is an author.

I also want to discuss some contributions I would not normally consider substantial.

Experimental subjects. Acknowledging people who help in the experiment is a positive social convention, unless of course the participant wants or is guaranteed anonymity. However, most experimental subjects do not contribute intellectually to the project.
Relatively light grammar editing. This is either a for-pay service or a favor. But correcting my grammar seldom constitutes a substantial contribution. The dividing line may be who applied the changes ... if someone gives me comments that I apply to the paper, those comments are probably simple grammar or typos, and probably not authorship level help. But if that person edits the paper directly, the changes may go beyond the grammar and make changes that help frame the work. That probably is authorship.
Provided funding. This is somewhat controversial, but in my opinion, providing funding does not make one a co-author. If the paper contains a substantial amount of words or ideas that were in the proposal, then the PI might be a co-author. If the primary author was funded on a grant, but performed research that was not directly described in the proposal, my judgment is that the PI is not a co-author. I must add, however, that well known and respected scientists disagree and believe that it is appropriate always to include PIs as co-authors. This disagreement definitely puts this situation into the gray area.
Did work that was cut during revision. STVR sometimes gets revisions with one or more names dropped from the original author list. The explanation was that they cut some material during revision, and those people no longer were authors of what was left in the paper. The only thing I insist on is a statement from the omitted authors that they agreed with the authorship change. At the same time, it is extremely difficult socially to tell a co-author that he or she is no longer a co-author. If the rest of the authors do not have the heart to tell someone his name is being cut, I could not criticize. However, strictly speaking, if your contribution is removed during revision, you are no longer an author.
Taught the class. If a class project leads to a publishable paper, that’s terrific for the student. It’s also quite possible that the instructor provided enough valuable intellectual guidance to have made a “substantial contribution.” Or not. Just having taught the class does not merit authorship; the contribution must come through direct interaction or input on that project.

I have also seen some reasons for excluding someone from the author list that I strongly disagree with. One is if the student left the program or graduated. It does not matter where the student is now, if the student made a contribution at the beginning, the student is still an author. Another was when a colleague once said “she’s only an undergraduate.” I still think that was unfair; if a 6-year old gives me an idea, that merits authorship even if she can barely write English. Not to mention that a publication can be a big deal to an undergraduate applying to graduate school.

One more principle I advocate is when in doubt, be inclusive. I would much rather include someone whose contribution is marginally “substantial” than omit someone who really believed she contributed. Having one more author won’t hurt. A difficult situation is if A and B thinks C’s contribution is not substantial, but C thinks C’s contribution is. In that situation, I recommend including C as a co-author. Again, it doesn’t hurt A and B to list another co-author, but it hurts C to be omitted. And beyond the concerns of the potential authors, mislabeling authorship also hurts the academic community.

The final point is about author order. Once deciding who the authors are, how do we order them? This is a difficult social issue as much as a technical issue. The two most common strategies are order of contribution and alphabetical order (by last, or family, name). Sometimes the order of contribution is easy to determine, but not always. So a compromise is often to list the “primary” author first, and the rest in alphabetical order. My personal rule is student first on papers from their dissertation work. Second, if relative contribution can be agreed on, we use that. If not, we go alphabetical. Again, it’s important that author order be discussed early and all authors agree.

I wish that I could provide definitive answers for these questions. I cannot paint “perfect lines,” but what I have tried to do is offer thoughtful guidance for how to think about the line between substantial and non-substantial contributions. A useful question is “can a specific substantial contribution be articulated for each person listed as author?” I hope this editorial provides enough thoughtful guidance to make good decisions about authorship.

[1] Jeff Offutt. Plagiarism Is For Losers (Editorial), Wiley's journal of Software Testing, Verification, and Reliability, 25(1), January 2015

[2] Oxford Online Dictionary, http://www.oxforddictionaries.com/, last access January 2015.

Thanks to Paul Ammann and Rob Hierons for providing helpful comments on an early draft of this editorial.

Additional resources, added October 2017:

International Committee of Medical Journal Editors, Defining the Role of Authors and Contributors: http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html
Nature journals authorship policy: http://www.nature.com/authors/policies/authorship.html
Nature editorial on authorship policy: https://www.nature.com/nature/journal/v458/n7242/full/4581078a.html
The art of writing science, Kevin Plaxco: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009394/

Jeff Offutt
offutt@gmu.edu
31 January 2015