The journal impact factor

Published in volume 18, issue 1, March 2008

This issue contains three divergent papers. The first, Modular formal verification of specifications of concurrent systems, by Gradara, Santone, Vaglini and Villani, proposes a bottom-up approach to verifying modular systems. Properties of components are first verified, then emergent properties of the system as a whole are verified. The approach is applied to a web service. The second paper, Simulated time for host-based testing with TTCN-3, by Blom, Deiss, Kontio, Rennoch and Sidorova, describes a method to test real-time embedded software. When real-time software is tested in a development environment where the timing characteristics do not match the target environment, this research proposes using simulated time to test the real-time properties. The third paper, IPOG/IPOG-D: Efficient test generation for multi-way combinatorial testing, by Lei, Kacker, Kuhn, Okun and Lawrence, presents two new strategies for t-way combinatorial testing. The strategies have been implemented in a tool, FireEye, which is available on the first author's website. Software engineering researchers have always developed tools, but most tools have not easily been available to other researchers. This positive trend has the ability to multiply the impact of our research, which brings me to the main subject of this editorial ...

Many readers of STVR may not be familiar with the "impact factor," but it is the major way that journals currently are evaluated. It is used almost exclusively world-wide by publishers, universities, research organizations, and government agencies to judge the impact of scientific journals. The factor is used to make hiring decisions, determine promotion and tenure, allocate research funding, and determine whether PhD students should be allowed to graduate. I first heard of this measurement when a visitor told me that she could not graduate without a journal paper, and the list of acceptable journals was determined solely by this specific measure.

Thomson Scientific's Journal Citation Reports is the recognized authority for evaluating journals [1]. It publishes statistics that are intended to objectively evaluate scientific journals and how they influence the global community of researchers. The statistics they compute are broad and multi-faceted. Unfortunately, the research community seems to have narrowed down to one specific measure, the impact factor, to measure journals. The primary advantage of the impact factor is that it is very easy to measure. Unfortunately, it does not capture the true quality of research papers, their effectiveness, or the long-term impact of journals.

Journal Citation Reports defines the impact factor as the frequency with which the "average article" in a journal has been cited in a particular year [2]. It is calculated on a year-by-year basis. For year Y, the number of citations to papers published in the journal during years Y-1 and Y-2 are counted, then divided by the number of papers published in years Y-1 and Y-2. The citations are taken from the Science Citation Index [3], also from Thomson Scientific.

The divisor is used to normalize the measure so it does not favor journals that publish more issues and papers. The impact factor is designed to evaluate a journal's importance relative to other journals in the same field.

It is important to note that Thomson Scientific is a commercial firm that sells its evaluations to publishers and research institutions. Its data and measures are not published in an open environment and not easily available to most scientists. Ironically, its measures have also not been subject to a peer review process.

As editor-in-chief of STVR, I have a good news / bad news view of the impact factor. The good news is that our impact factor for 2006 is 1.1, which is quite high relative to other software engineering journals. This places it third on the list I was shown, following only the IEEE and ACM Transactions. The bad news is that it is quite hard to accept this impact factor as meaningful to our field.

The biggest issue seems to be the 2-year window for measuring the impact. While this window may be reasonable for some very fast-moving fields, it is less useful for fields with long submission to publication time, and in fields that expect substantial empirical validation. Consider a scenario where a paper in this issue of STVR gives you a new research idea. (True impact, right?) Assuming you start immediately and work hard, it may take 6 months to develop the idea into a useful theory and a prototype tool. Again with hard work, another 6 months are probably needed to refine the tool and collect empirical data. Assuming you have been working on the paper simultaneously with doing the empirical work, it's possible to submit the new paper within a year of the publication of the original STVR paper. If the new paper is written well and the reviewers are prompt and reasonable, the paper could be accepted per minor revision within 3 months. Again with hard work and prompt reviewing, the paper could be accepted within another 3 months. If the journal has a very short backlog, the paper could appear 4 or 5 months later

Thus, under the absolutely best case scenario, the citation to the original paper could appear at the very end of the 2-year impact factor window! How many best case scenarios are there? If I write a paper in 1999 or 1992 that is still cited in 2008, isn't that a true impact?

Software engineering researchers are certainly well acquainted with the difficulties of inventing new measurements. It is ironic that the impact factor measurement, which is used to assess scientific journals, would certainly never survive the peer review process of those same journals. Most of us probably spend a lot of time measuring, and are well aware that measurement is very difficult. We assign students grades in classes (a process once described as "an inadequate report of an inaccurate judgment by a biased and variable judge of the extent to which a student has attained an undefined level of mastery of an unknown proportion of an indefinite amount of material"). We review papers for journals and conferences, and we evaluate our colleagues for tenure and promotion. This last is perhaps the most relevant. For most professors, the first step in evaluating a promotion case is to count the number of published papers. But we also recognize this is flawed, so we assign people to read papers and we get external letters in an attempt to assess the "impact" this scientist has made on his or her specific area. Measuring the impact of research is hard!

Yet it seems to me that a journal impact factor measurement should satisfy certain broad guidelines. First, it should be independent of the number and length of the papers published in the journal. It should also favor fundamental advances in the field over incremental or short term contributions. The impact factor should be relatively stable in a year to year basis; a 50% swing within one year is an indication that something is wrong. Most importantly, an impact factor should measure the long term impact of a journal; papers that are still cited 10, 15 or even 50 years after they appear must have had some impact!

Interestingly, in his essay on the impact factor, Garfield [2] cautions against using the same calculation for every scientific field and argues for using the impact factor "discreetly." Clearly, we are not heeding his advice.

[1] Journal Citation Reports, http://scientific.thomson.com/products/jcr/.

[2] Eugene Garfield, The Thomson Scientific Impact Factor, http://scientific.thomson.com/free/essays/journalcitationreports/impactfactor/, originally published in the Current Contents print editions, June 20, 2004.

[3] Science Citations Index, http://scientific.thomson.com/products/sci/.

Jeff Offutt
20 December 2007