DORA and Research Assessment

  • By
  • Written starting and published on

There has been an ongoing discussion in the research community about how to assess the impact and quality of research outputs. Much of this is driven by the desire of funding agencies to fund the best research and of employers to employ the best researchers. These are good things that everyone would agree with. Yet, how do you measure good research?

That is a surprisingly difficult question to answer.

The naive approach to research assessment is to boil everything down to a number—a metric. But you cannot reduce quality or impact down to a simple quantitative singularity, unless it happens to be the number 42. Shortcuts to research assessment don’t work. Our current degenerative system of research assessment is an outstanding example of the need not to do assessments this way.

Last December, a forward-thinking group of editors and publishers met to discuss some of the problems with research assessment and came up with a declaration that recommended eighteen ways to improve the status quo. These are very doable things that we can do now to improve the situation, not some ivory-tower Mumbo Jumbo.

The logotype for the San Francisco Declaration on Research Assessment (DORA)
The logotype for the San Francisco Declaration on Research Assessment (DORA)

While this guidance is not a complete solution to the problem, it is a way to improve the current system while improved methods of research dissemination are explored. Quoting from the declaration, “Outputs other than research articles will grow in importance in assessing research effectiveness in the future, but the peer-reviewed research paper will remain a central research output that informs research assessment. Our recommendations therefore focus primarily on practices relating to research articles published in peer-reviewed journals but can and should be extended by recognizing additional products…as important research outputs.”

In general, the declaration recommends to not rely solely upon journal metrics as a measure of the quality of research articles, individual contributions, or in hiring, promotion, or funding decisions. This should be obvious, but it just shows how messed up our system really is.

We can get so caught up in metrics that we forget what it is that we are actually measuring. The two most common reasons to use metrics are to measure the impact and the quality of research. It is hard to measure impact. Measuring impact takes time, even decades. But we should be able to measure quality. Quality is an inherent part of the research process. Impact is not necessarily under the control of the researcher, but quality is. So, ideally, we should be basing our assessments and hiring, promotion, and funding decisions not upon impact, but upon research quality. Impact presupposes quality.

Research Assessment in one field

In the field of genealogy there is an unusually disproportionate amount of cloddy research that is published. On one hand are the ever-proliferating online family trees that have no published analytical backing (or, all too often, no analytical backing at all). On the other hand are high-quality peer-reviewed articles from journals such as the National Genealogical Society Quarterly. While the dichotomy between mediocre and quality research outputs in this field may seem like a disadvantage, it also implies that there is a great opportunity for education and a need for accurate and understandable assessments of research quality.

So little of the research output is published on a family tree that it is essentially impossible to assess the quality of the research. If we look at a journal article, however, we get a lot more to go on. Yet, is that enough? How do we measure the quality of research in a journal article? Do we measure it by how many footnotes there are, or the percentage of the page they take up?

In genealogical research, we would use the Genealogical Proof Standard (GPS) to measure research quality. The GPS outlines five criteria that, when used together, are a good measure of research quality. In other fields there are similar standards of research. In the end, you don’t end up with a quick and easy number, but you do get an understanding of whether the research is solid or lacking. Research must be assessed as a whole.

These are some of my initial thoughts on this, and I am still trying to think through the many issues involving the assessment of research quality. I would love to invite you to share your thoughts!

So how do you measure research?