BICAMERAL REVIEW
Five Papers by Forsdyke
A Systems Analyst asks about AIDS Research Funding (1989)
Bicameral_Grant_Review:_An_Alternative_to_Conventional_Peer_Review (1991)
How a Systems Analyst with AIDS would Reform Research Funding (1993a)
On giraffes and peer review (1993b)
Undergraduate Academic Record as a Predictor of Subsequent Success in a Research Career (1994)
A SYSTEMS ANALYST ASKS ABOUT AIDS RESEARCH FUNDING
The Lancet (1989) Dec. 9th. 1: 281-283.
By Donald R. Forsdyke (With copyright permission from The Lancet Ltd.)
What might a systems analyst (SA) with HIV infection want to know from the director of a medical research funding organization (D)? |
SA - Thank you for agreeing to see me. I'm here because I'm seropositive for the AIDS virus. I want to do something about it. D - Well, our organization doesn't canvas for funds directly, but if you are able to make a donation, that could help. SA - I think I might be able to make a more distinctive contribution. My original training was as a design engineer. For the past 20 years I've been a systems analyst advising organizations, mainly in the private sector, how to make their operations more efficient. D - If you could apply your skills to increase the efficiency of fund-raising for AIDS research... SA - Only if I can be sure that it is really a shortage of funds that is limiting progress. D - Well, there are many good ideas out there. We haven't enough funds to try them all out. So we have to be selective. If we had sufficient funds we could support more ideas and we might have an AIDS cure very soon! SA - My doctor tells me that there is an unpredictable latent period before the onset of symptoms. I may only live a year, but the chances are that I will live five or ten years or even longer. D - Quite correct. Medical researchers have already come up with at least one drug, AZT, which can prolong the life of AIDS patients. SA - So I can take a long-term approach in analyzing AIDS from the systems viewpoint. D - What information do you need from me? SA - Well, tell me how the medical research system works. New initiatives in business and industry need, first and foremost, bright and well informed people. In the right environment they will come up with ideas. Then funds have to be committed to test the ideas. People, ideas, and funds. Presumably these are also the key components in medical research problem-solving? D - Certainly. Having obtained advanced degrees in the biomedical sciences, future independent researchers have to compete for one of the scarce positions in universities or research institutes. Individuals who successfully surmount all the hurdles must be both highly motivated and very bright. SA - OK. Let's assume that the particular qualities selected for by the appointment processes are the qualities needed for creative medical research, and that the institutions to which the researchers are appointed have all the necessary facilities. What happens next? D - The researchers must apply for research funds to one or more of the funding bodies, such as that which I head. If you like, think of the researcher as a business entrepreneur who has an idea and my institution as an investment company that can help get the project moving. A financier in an investment company cannot fund everyone who applies. A successful financier has to be very shrewd in deciding among the entrepreneurs who apply. SA - So the research funding system is "capitalist" in philosophy to the extent that researchers must compete with each other for a limited quantity of funds. Even though the researchers have had to compete with their peers to gain their positions, they must compete yet again for the funds to test their ideas? D - Yes. The spur of competition is probably a major factor motivating researchers. The vigour of the western capitalist economies, compared with that of the socialist-block countries, surely supports that? SA - Hold on. Let's back up a bit. First, please tell me more about the funding organizations. These get their funds from the public either through taxes or as direct donations. Now, I'm interested in accountability. If a finance company makes unwise decisions it loses money and may become a target for a takeover. The spur of competition, as you say, keeps the financiers on their toes just as much as those who apply to the financiers. What are the penalties to a research funding organization if it fails? Indeed, how is failure or success monitored at the organizational level? D - The spectacular advance in biomedical research over the past few decades speaks for itself. With more funds the advance might have been even more spectacular. We do have periodic internal reviews of our operations and, of course, we are always seeking input and advice on how we might improve. But there simply are not enough funds for all the researchers. The organizations do not compete with each other. We try to coordinate our efforts to avoid overlap. SA - So the capitalist model does not really apply to the funding organizations themselves? D - We funding organizations are essentially monopolies. We enjoy this situation because, if you like, it is a sellers' market. We "sell" our funds to those researchers who, in our judgement, come up with the best research proposals. SA - The idea of researchers competing for funds has an obvious appeal to someone with my background. But businesses and industries in the capitalist countries work primarily for themselves and their shareholders. Most medical researchers, as I understand it, are not trying to become financially rich. They are trying to obtain new knowledge which they donate freely to the nation and the world. Just as the funding bodies are trying, as you say, to coordinate their efforts, shouldn't the researchers be doing the same? D - Of course, we encourage researchers to collaborate and communicate with each other. For example, we look very favourably upon researchers with skills in different areas who come together and apply for funding as a group. SA - But you still have competition, be it between individual researchers or small groups of researchers. Clearly, as in business and industry, you cannot have free and open communication between groups in competition with each other. I've been wading through "Natural Obsessions: the Search for the Oncogene" by Natalie Angier (Houghton Mifflin, 1988). Much of it is quite above my head I'm afraid. Please bear with me while I read from the introduction by Lewis Thomas:
If what Thomas is saying is correct, there must be a trade-off between the spur of competition and, if you like, the spur of communication. Perhaps this is too simple an analysis. Tell me how the system works in practice. D - Well, we ask researchers to submit written proposals. We allow only 20 pages. The proposals must contain a review of the published work, a hypothesis, and the experiments designed to test the hypothesis. They must spell out the implications of the new knowledge they expect to obtain and provide a detailed budget. SA - Most successful financiers I have met place a considerable emphasis on track record. This is relatively objective. An entrepreneur who has come up with successful ideas in the past will usually get support for ideas which may, at face value, not seem very promising. D - The applicants are indeed asked to describe their past performance. However, an applicant would penalize himself/herself if he/she used up too many pages describing past work and did not give sufficient information to permit evaluation of the proposed work. SA - So here we have another difference with industry and business. The organizations funding medical research emphasize the evaluation of future "promise" of what might be done, rather than of past performance. Now how is this evaluation carried out? A financier might, in confidence, consult with one or more industry analysts. These would be people with specialist knowledge in the area of a proposal, who themselves have a track record for giving good advice and for not leaking ideas to potential competitors. They are paid handsomely for their advice. If they fail, they are consulted less frequently in future. There is a dollar penalty. D - The medical research system is quite different. We have a system of peer review. Copies of each application are sent to three or more researchers with expert knowledge in the area of the application. These reviewers are placed on their honour not to disclose the contents of the application to others and to evaluate the proposal objectively even though they may be advocating support of research in competition with their own. A peer reviewer does not know who the other reviewers are. If his reviews are consistently out of line with those of other reviewers then he may not be consulted in future. SA - If it became known that a financier were sending business proposals to competitors for review, he would soon find a decline in the number of proposals submitted. His business would suffer! The system you describe would seem to work only if operated by saints. Yet hardly a week goes by without some medical research scandal - fraud or plagiarism (Click Here) - being aired in the newspapers. D - The system is not without drawbacks. There are so many applications to review. A class of professional reviewers does not exist. It is paradoxical that, while the best persons to review an application are those engaged in the same research, these same people have the most to gain from the privileged information they are given access to. Somehow the system works. SA - But does it work as well as it could? A system where a competitor has only to sit back and wait for the latest crop of bright ideas to arrive on his/her desk seems wide open to abuse. There seem scarcely any penalties for inadequate advice. For my analysis to be complete I will need to know more about the methods both of selecting reviewers and of monitoring the quality of their advice. However, to save time let's say that applications have been reviewed by the methods you describe. What happens next? D - The applicants are given a numerical rating so that they can be rank-ordered. Of course no system of this sort is perfect: the skills of the reviewers are severely tried as they attempt to evaluate the relative merits of different projects. But the rank-ordering allows us to assign funds in a logical way. Those ranking highest get all the funds they need to complete the work in reasonable time. Funds are then allotted similarly to applicants with successively lower rankings until the funds run out. Then there is a cut-off. Those below the cut-off point get no funds. This means that many very good applicants do not get funded. SA - And since they are not funded, presumably they will not be able to do the work and show whether the rating was wrong. Sharp cut-off points in evaluation-determined allocation systems tend to turn the evaluations into self-fulfilling prophesies. The funded succeed because they are funded. The unfunded fail because they are unfunded. D - Do you have an alternative suggestion? SA - The first thing an engineer wants to know when asked to design a new system is what the system is required to do and with what level of accuracy. If the system is error-prone, as you acknowledge the research funding system is, then this has to be taken into account in system design. From what you tell me, the most certain fact you have is that the person at the very top of the rating scale is likely to be better than the person at the very bottom of the scale. To give the person at the top everything he or she needs and the person at the bottom nothing seems appropriate in a competitive system. But as you move progressively down from the top of the scale and up from the bottom of the scale, your confidence that the rating system has properly discriminated between the competitors must be much less. In that circumstance a design engineer would probably come up with a sliding scale of fund allocation, rather than a sharp-cut off point. D - How would the sliding scale operate? SA - Well, first a decision would be made as to how many projects were of sufficient merit to justify support. This might eliminate the very lowest rated projects. Then the sliding scale of funding would be applied to the approved projects. Only those at the very top of the funding scale would get all the funds they needed to complete the work in a reasonable time. Those just below the top would get, say, 90% of what they needed, and so on down to the approved projects of "lowest merit" which might receive only 10%. D - But what if a project just below the top were directed at a critical aspect of the AIDS problem? A cut-back to 90% funding would surely slow the rate of progress towards a cure. Would you want that? SA - That is precisely the point. A design engineer would be trying to optimize the rate of progress in the face of uncertainty in the rating system. Maybe the project awarded only 10% funding will be found, with hindsight, to have made an important contribution to knowledge leading to an AIDS cure. The 10% of funding will at least allow the project to move ahead, albeit very slowly. In the absence of funding the proposed experiments might never be performed. The research team might be disbanded and its laboratory space allocated to others. The damage might be irreversible. D - That is an inescapable fact of a competitive system. If you like, fund-withdrawal is a punishment. The cut-off point is a guillotine. Fail to score above the cut-off point and it's "off with his head". The perception of the possibility of a loss of funds should be a spur. SA - But the punishment should fit the crime. Is it appropriate that an applicant rated just below the cut-off point receive the same capital sentence as an applicant at the bottom of the rating scale? And is an applicant just above the cut-off point, knowing that his/her research life hangs on a thread, more or less likely to collaborate and communicate with others? A sliding scale would retain some element of competition, but would make that competition fairer. With a performance-evaluation approach, past performance would be assessed against the funds that had been received. A person who performed better than expected, having been given only 10% funding, might find himself/herself getting 20% funding in the next competition. In that way, over a period of years, individuals might move smoothly up and down the scale until they found a level appropriate to their abilities. D - Your suggestion doesn't take into account the political realities. What you are proposing is that we adapt to, that we accept, the present low level of total system funding. A research team cut-off from funds is visible and often vocal. In various ways it protests to government and the general public. Take this away and you would see total system funding shrink even more. SA - The sliding scale would not dampen protest, it would probably increase it. Individuals with 90% funding, who might have received 100% under the present "guillotine" system, will join the ranks of the disenchanted. Most important of all, no longer under the shadow of the guillotine, researchers will feel more free to follow Lewis Thomas's imperatives and collaborate. D- Individuals with 90% funding would probably direct their disenchantment not at the public and the politicians but at the funding organisations for having adopted a sliding scale in the first place. SA- Yes, that would probably happen. But that is irrelevant to whether or not a sliding scale would produce a more efficient distribution of research funds. It's very easy for those who win in a competitive system to accept the syllogism:
One cannot expect pressure for reform to come from the top. D - You came here to learn about the funding system. Your remarks indicate that you haven't been convinced by what I've told you? SA - Systems for organising human beings must be based on the assumption that the decision-makers will not be saints. The system you've described seems to be "capitalist" in spirit but contains none of the constraints that make capitalist economic systems to vigorous, powerful, and yes -- Wall Street scandals notwithstanding -- honest! |
BICAMERAL GRANT REVIEW: An Alternative to Conventional Peer Review
Faseb J. (1991) 5, 2313. (Click Here) This gives a concise summary in a letter to the editor.
BICAMERAL GRANT REVIEW: HOW A SYSTEMS ANALYST WITH AIDS WOULD REFORM RESEARCH FUNDING
Accountability in Research (1993) 2, 237-241.
By Donald R. Forsdyke
(With copyright permission from Gordon & Breach, Publishers.)
A systems analyst (SA) with AIDS has applied his professional skills to determine whether available research funds are being spent optimally. After an initial briefing by the director (D) of a major funding organization (see above) and visits to various research laboratories, he now returns to suggest to the director a novel "bicameral" method of reviewing research proposals. The "retrospective" and "prospective" parts of research proposals should be separated and independently routed. Peer-review should be entirely retrospective and concerned with past performance relative to funds received. Prospective review, concerned solely with budget, should be performed in house by the funding bodies. The director is not entirely in agreement. |
SA-Thank you for agreeing to see me again. When I first learned that I was seropositive for the AIDS virus, I was overwhelmed by a feeling of utter helplessness. However, I've worked for many years as a systems analyst advising organizations how to make their operations more efficient. It occurred to me that I might apply my professional skills to medical research organizations, such as that which you direct. D-When we last met I told you how the research funding system works (Forsdyke, 1989a). You thought that competition between researchers for research funds might be delaying progress by impeding collaboration. Have your studies led you to modify that view? SA-On the contrary, I agree with Lewis Thomas that the degree of competition may now be counterproductive (Angier, 1988). The system as we know it today was established in the late 1940s. By all accounts, it worked very well as long as sufficient funds chased the available talent (Apirion, 1979; Mandel and Vesell, 1989). Then in the late 60s financial cutbacks began to reveal serious structural problems which were not apparent at the outset (Osmond, 1983; Forsdyke, 1983a,b; 1989a,b,c; Lederberg, 1989). D-I'd like to hear what problems you have identified. But first let's make sure we agree on basics. What do you understand to be the mission of the organization which I direct? SA-Your mission is to advance medical knowledge. This will result in better methods of preventing, diagnosing and treating diseases such as AIDS. To this end you have a system for allocating funds to medical researchers. D-It's not that simple. We have to have the funds before we can allocate them. How do we persuade the government to put funds into medical research rather than into other areas? How do we persuade individuals making charitable donations to choose medical charities rather than other charities? As part of our mission, we have to be concerned with public relations. This can affect our allocation of funds to researchers. For example, the recent discovery of a gene defective in cystic fibrosis patients represents an immense advance (Koshland, 1989). Many of us foresaw this decades ago and wanted to put more funds into basic research in molecular biology. Instead we had to pour funds into various "quick fix" approaches to satisfy the cystic fibrosis lobby. SA-I accept that, to keep up the global level of funding, some funds have to be allocated in that way. D-It is also very important for fund-raising that our system for allocating funds not only be sound, but be perceived as sound. SA-It is that thought which may be muffling a lot of the dissent I hear in the laboratories which I have visited. It is apparent that the current peer review system is overburdened and is working very inefficiently. Yet researchers are reluctant to express their discontent publicly. They may even fear retaliation from the funding bodies if they were to do so. D-Do you have any reforms in mind? SA-We expect too much from the peer review process:
Only the first two of these are really essential. There are numerous ways in which researchers can and do get constructive criticism of their ideas for future research. Provision of such criticism by the funding bodies is redundant. Abandonment of this would allow a major restructuring of the peer review process (Forsdyke, 1991). D-A frequent complaint from researchers is that we do not provide sufficient feedback so that unsuccessful applicants can improve their next proposals. Now you tell me this should be abandoned altogether! SA-I would reform the peer review process by separating grant applications into two distinct parts, a "retrospective" part and a "prospective" part. These would be routed separately. The retrospective part would describe what had been achieved with the available funds. This part alone would be sent out for peer review. The reviewers would evaluate performance in terms of the funds received. This would be difficult, but it would be more objective and less error-prone that the "prospective" evaluation of an applicant's ideas for future research (Forsdyke, 1983b, 1989a). D-Would you allow on-site visits to ensure that the research results reported had actually been obtained? SA-Yes, there would be some random auditing both of results and expenditure. Knowledge of this possibility should ensure accurate reporting. Positive research results would score highly, but discriminating reviewers would also be looking at the logic of the overall approach and how the researcher had marshalled the available resources. A big problem, requiring a long-term approach, might produce no publishable results within a given funding period, yet might still score highly. An important part of this review procedure would be that reviewers would be evaluating the ratio of performance to funds received. There would be an incentive to be economical. Indeed, only the funds actually expended would be taken into account. These might be less than the funds awarded at the beginning of the funding period. D-How would you deal with people who were just entering the system and did not have a research track record? SA-Most future independent researchers will have gained some sort of track record during their apprenticeships. Their initial funding would be modest. Within a few years they would have an independent track record which could be evaluated. D-Would it be good politics for the funding bodies to ask politicians and private donors to support research ideas which had not been independently evaluated? SA-No. But from the point of view of the politics of fund raising, the emphasis of the retrospective part of the grant application on accountability for past performance should be a plus. There is a middle ground between giving a researcher carte blanche on what research is done and scrupulously evaluating the cogency of that research. Here we come to the "prospective" part of the grant application. This would be routed to a new class of specialist financial officers within the funding organizations. These individuals would have professional expertise in evaluating research budgets. This "in house" part of the application would contain sufficient information on the proposed research to allow a financial officer to determine if the budget was realistic. Thus the granting body would know the research plan, even though it would not be evaluating that plan directly. Obviously, if some quite bizarre line of research was proposed, there would be the option of a veto. D-Would the granting body require that the previous research plan be included in the retrospective part of the applicant's next grant application? Would the applicant be criticized if the results achieved did not match the plan? SA-No. Peers would be concerned with evaluating the quality (relatively objective) and the value (much more subjective) of the results. It would probably be prudent for the applicant to describe the path, serendipitous or otherwise, which had led to those results. But that would be for the applicant to decide, given the need for conciseness. D-How does all this relate to the sliding scale of fund allocation which you suggested when we last met (Forsdyke, 1989)? You proposed that first a decision would be made as to how many projects were of sufficient merit to justify support. Then a sliding scale would be applied to the approved projects. Only those at the very top of the funding scale would get all the funds they needed to complete the work in a reasonable time. Those just below the top would get 90% of what they needed, and so on down to the approved projects of lowest merit which might receive only 10%. SA-When the retrospective review by peers of past performance was completed, a rating would be available, just as under the present system. When the prospective in-house review of the proposed budget was completed, a budget figure would be available. After this "bicameral" review, all that would remain would be to rank the applicants by rating, decide on a rating cut-off point and then allocate funds on the sliding scale to those above that point. Obviously, the rate of progress and scope of a project which only received 10% funding would be severely compromised. But 10% funding is enormously different from 0% funding. With imagination and fortitude I believe many projects could limp along, even with 10% funding. D-What sort of feedback would an applicant get? SA-From the peer review he or she might get a critique of past strategy. For example, a researcher might be criticized for not having used a method capable of giving more definitive results or for not adequately justifying the introduction of an expensive new procedure. This might have been circumvented by collaboration with a neighbouring laboratory. The researcher's interpretation or evaluation of the results might be challenged by the reviewers. From the prospective budget review, a researcher might learn of less expensive ways of carrying out the research. D-It would be expensive to recruit and train more specialist financial officers. Yet such people could play an important role in keeping down the prices of equipment and supplies. The medico-industrial establishment would not like it! But do you think that the financial officers would be able to detect those applications in which the budgets had been inflated in anticipation of receiving, through the sliding scale, less than the optimum budget? SA-I think the task of financial officers would, in many respects, be far easier than the task of peer reviewers. The officers, after all, would be dealing with numbers. If there was any doubt they could demand to see past accounting records, seek justification for past expenditures and ask for a better justification of future expenditures. It could be difficult to pull the wool over their eyes. D-You have more faith in the skill of your proposed budgetary gate-keepers than I have. What would you do, for example, in the case of a researcher who, knowing he/she had achieved a great deal in the previous granting cycle on, say, a budget of $100,000, decides to propose a new, well justified project, which would cost $1,000,000? The peer review process you propose would give him a very high rating for his productivity relative to dollars received, so that he/she could expect to receive funding close to the 100% level. SA-Two answers. First, the researcher would know that, down-the-line, he or she would have to justify the $1,000,000 expenditure in terms of results received. This would act as a restraining force providing pressure towards realistic budget-making. Second, I do not think we should impose the bicameral review system rigidly, to the exclusion of other approaches. Bicameral review should be applicable to most on-going projects which have a relatively stable level of expenditure...perhaps 90% of the total. Those researchers who propose a substantial departure from previous expenditures could submit their applications for conventional peer review as now practiced. D-The present system may have its faults, but at least it is relatively simple and well understood. You are proposing a far more complicated two level review process, bicameral review, and, on top of that, you now say that we will still maintain conventional peer review for special cases. Running such a system would be a bureaucratic nightmare! SA-That is exactly how I have heard researchers describe the present system, a bureaucratic nightmare. Surely we can do better? Let us at least give bicameral review a try. It is remarkable that the funding bodies, dedicated to the pursuit of truth through experimentation, have themselves for so long neglected to experiment with different mechanisms of fund allocation. The optimum harnessing of the expertise, energy and enthusiasm of the nation's biomedical work force is critical for the conquest of AIDS and of the many other diseases that inflict humankind. My proposals for reform of the funding system could result not only in a better distribution of research funds, but could also influence in a positive manner the conduct of those engaged in research (Forsdyke, 1983a,b; 1989a). Let us experiment! REFERENCES Angier, N. (1988) Natural Obsessions: The Search for the Oncogene. Boston: Houghton-Mifflin. pp. 1-4. Apirion, D. (1979) Research funding and the peer review system. Fed. Proc. 38, 2649-50. Forsdyke, D.R. (1983a) Canadian medical research strategy for the eighties. I. Damage-limitation or superelitism as the basis for the distribution of research funds. Medical Hypothesis 11, 141-145. Forsdyke, D.R. (1983b) Canadian medical research strategy for the eighties. II. Promise or performance as the basis for the distribution of research funds. Medical Hypothesis 11, 147-156. Forsdyke, D.R. (1989a) A systems analyst asks about AIDS research funding. Lancet 2, 1382-84. Forsdyke, D.R. (1989b) Peer review policy. The Scientist 3, 16:13. Forsdyke, D.R. (1989c) Sudden-death funding system. FASEBJ. 3, 2221. Forsdyke, D.R. (1991) Bicameral grant review: an alternative to conventional peer review. FASEBJ. 5, 2312-2314. Koshland, D.E. (1989) The cystic fibrosis gene story. Science 245, 1029. Lederberg, J. (1989) Does scientific progress come from projects, or people? Current Contents, Life Sciences 32, 48:5-12. Click Here Mandel, H.G. and Vesell, E.S. (1989) NIH funding. FASEBJ. 3, 2322-2323. Osmond, D. (1983) Malice's wonderland. Research funding and peer review. J. Neurobiol 14, 95-112. |
FASEB Journal (1993) Vol. 7 , 619-621. (With copyright permission from the Editor, S. W. Jacobson) By Donald R. Forsdyke
|
SUMMARY For several decades grant applications in the biomedical sciences have been assessed by peer review. However, the design of the peer review system was based on past precedent rather than on recognition that a novel approach was needed. Flaws in system design have been exposed by funding cut-backs. As a result the research community is being torn apart.
Recognition of Error-Proneness
VACANT: one ecological niche. WANTED: an animal that can run like a horse, but can also nibble the most juicy leaves at the tops of trees. If you had to design such a beast from scratch, you would probably end up drawing a horse-like quadruped with a long neck. You would figure that the animal should be able to hear predators and alarm calls and you would equip it with well-hooded ears. Since it would receive alarm calls, it should also be able to send them. So you would equip it with a larynx. You would then pencil in a nerve running from the brain to the larynx, a distance of perhaps 20 cm. When checking your design against the real world, you would find a great similarity to the giraffe. However, the nerve to the larynx is actually several meters in length! From the brain, it runs down the neck to the chest where it loops round a major blood vessel and then returns up the neck to the larynx. The reason for this strange peregrination is quite well understood. In the course of evolution, tissues began moving around taking their nerve and blood supplies with them. Some tissues migrated forward to form structures in the neck; adjacent tissue migrated into the chest. When this happened the "wires got crossed". A nerve got caught round a blood vessel. To solve the problem either the blood vessel had to loop up into the neck and then back to the chest, or the nerve had to loop down to the chest and then back to the neck. The giraffe has not gone the way of the dinosaurs because the length of its laryngeal nerve was not critical for its survival. But millions of equally outrageous evolutionary design flaws have resulted in early extinction for the species concerned. Design by evolution is often very inefficient. Design by evolution is always constrained by the past. Sometimes, in human affairs, past intellectual baggage hinders our ability to forge novel approaches. Problems which require solution by revolution, rather than by evolution, are not seen as such. The bold line drawn from the brain to the larynx of your prototypic giraffe would be an example of "design by revolution". The origins of the modern peer review system are murky (1,2). It seems that no one ever sat down and tried to design the system from scratch. Rather, it evolved in a piece-meal fashion. Peer review has been with us for several decades. Yet, as currently practiced, it threatens the renaissance in the biological sciences that began with Darwin and Mendel and gained fresh impetus with the discovery of the structure of our genetic material in the 1950s (1). Although historians may one day tell us which committees and which individuals were responsible for introducing the various aspects of the peer review process (3), it is doubtful whether we will ever know and fully understand the factors, conscious or unconscious, which guided their deliberations. I here offer an explanation of how the peer review system arose in the hope that any insight provided may hasten reform. The system as we know it today was clearly discernable in the late 40's when the benefits to be derived from a large public investment in biomedical research became readily apparent. Briefly defined, the task was to devise a system for allocating public funds so as to harness optimally the energy, enthusiasm and expertise of a nation's biomedical workforce to the goal of attaining solutions to problems such as cancer, heart disease, etc.. The design of the system appears to have been evolutionary; it was based conceptually on other systems with which the designers were familiar and with which they approved. Prominent among these would have been the education system. We may assume that the designers had all been through the education system and that the system had been kind to them. One feature of the education system is that a limited resource, such
as access to university, is rationed out based on one's ability to pass examinations. The
designers were all very good at examinations. A teacher had taught them the dates of the
Battle of Hastings and of the American War of Independence. Subsequently there was a test.
The test was marked by the teacher who knew the correct dates. Then, there was a ranking
of the students based on the marks they had received. A comforting feature of the test was
that, when repeated with different sets of questions, the previous ranking was closely
approximated. Thus it was perceived as objective and just. Personal attributes needed to
fare well in the examination system, such as the possession of a good memory and the
ability to work hard in an organized manner, are attributes required for many complex
tasks in modern society. The examination system worked well in allocating rewards to those
who could best benefit from the further educational opportunities needed to prepare them
for such complex tasks. In gaining the approval of the education system, the designers had
come to accept a variety of its premises, which included that:
So, in the late 1940's, there were a number of biomedical researchers who, by surmounting various academic obstacles, had won positions at universities and research institutes. It was very natural to think of asking them to write a "test" (grant application) stating what they wanted to do and why they wanted to do it. They had all been very good at writing tests, so did not demur. Then there was a stumbling block. Where was the teacher who, knowing the right answers, would mark the papers? Thus, peer review was born. The researchers would mark each other's papers. The loss of the authority figure (teacher) gave the process a democratic air, which may have made it easier to sell to the politicians. Another selling point was the notion that the researchers would be competing with each other. Perhaps the "spur of competition" would drive the biomedical research system as effectively as it appeared to drive the capitalist economic system (4). Thus the designers would have drawn heavily on analogies, not only with the educational system, but also with the political and economic systems. And so the process began. The grant applications were written and duly marked. Funds were awarded to those who scored highly. For many years, as long as adequate funds chased the pool of talent there were few complaints from the research community. Progress was hailed by system administrators as a sign that all was well. The fact that a train is moving ahead at 20 miles/hour sounds great if you do not know that trains are capable of much greater speeds. Since the same peer review system, with minor modifications, was adopted throughout the western world, there were no adequate controls to allow one to determine whether the system was better than any alternative. Then in the early 1970s came the crunch. For the first time (at least in North America), there were insufficient funds to sustain all the talented researchers (5-7). The administrators, muttering among themselves about the invigorating effects of heightened competition, responded by elevating the cut-off point below which funds would not be given. Suddenly, a new selective gate had been imposed. Being able at research was no longer a guarantee of getting through. A new breed of scientist began to emerge, -- the grantsmen, -- people whose skills lay not so much in doing good science, but in tuning into the perceptions of the peer group. (I am generalizing here. Fortunately a few precious individuals, we all know who they are, escape such facile classification.) The new selective gate also influenced the choice of the peers who would act as gate-keepers for the rest. There had always been a tendency to choose the "best", as defined by being successful at doing research (and hence getting funded), to act as peer-reviewers. The grantsmen, by definition, were now the best and these came to dominate the peer review process. So grantsmen were being judged by grantsmen and their expertise lay, not in being creative scientists, but in being able to tune-in to the perceptions of other grantsmen. In response to mounting unrest, in the mid 1970s the US National Institutes of Health launched a national enquiry into the peer review system under the chairmanship of Ruth Kirschstein. Much was said by all interested constituencies. Of course the grantsmen were delighted with the system: "We are excellent; the system judges us as excellent; therefore the system must be excellent." In time a multivolume report appeared (8). But the resulting changes were largely cosmetic. The administrators shrugged. Sure, like democracy its a terrible system, but its the best we have. The reasons why no change was forthcoming are not hard to discern. By choosing to use all four limbs for locomotion the ancestors of the giraffe had foreclosed the options of handling tools or climbing trees. Likewise, three decades of nurturing the development of procedures and forms (with such evocative titles as PHS398 and MRCC11), had generated an entrenched bureaucracy. Maintaining public confidence, and hence the flow of public funds, was seen as critical. The virtues of peer review were loudly proclaimed. The words "excellence" and "peer review" were repeated together so often that mention of one came to imply the other. To admit the possibility that the peer review process was flawed might suggest to government the possibility of replacing it with an alternative of its own design, which might be far worse. And so through the 80s and 90s, as cut-backs deepened, the administrators responded by raising the cut-off point higher and higher. At competition after competition the guillotine came down. Our universities and research institutes were awash with academic blood (9)! Reports of cases of scientific plagiarism and fraud increased. The peer review system was described by Joshua Lederberg as having become "viscous beyond imagination" (10) and by Phillip Sharp as having taken on a "mask of madness" (11). Lewis Thomas bewailed the fact that the increased competition was decreasing collaboration and communication between researchers (12). The administrators wrung their hands and mumbled that things would be just perfect if there were just more money. The public and the politicians responded as best they could, but the new dollars went straight into the pockets of the grantsmen. The administrators tried to improve collaboration by trumpeting new forms of competition to encourage researchers to collaborate. The grantsmen moved in. Grant applications arrived festooned with appendices containing letters from prospective collaborators (other grantsmen) all eulogizing the qualities of the applicant and swearing eternal collaboration. And so through the 90s and the turn of the century. The incidence of cancer increases. An AIDS pandemic spreads relentlessly into new sectors of the population. The halls and corridors of our hospitals and mental institutions echo with the cries of the unfortunate losers in genetic roulette. This is a deadly serious business! Recognition of Error-Proneness The problem, as I see it, is to break out of the mould created by the evolutionary mind-set of the system designers. One should consider that what we are really trying to do with peer review is to predict the future. Which of a set of researchers is most likely to make a contribution which, with hindsight, will be recognized by future generations as having been the most logical at this point in the development of biomedical knowledge? One should then arrive at the conclusion that the task is either impossible or, at least, highly error-prone. Daniel Osmond has pointed out that in a valid competition, be it for research funds or anything else, there must be appropriate conditions, such as a starting line and a goal.
He concludes that: "those who conduct competitions must be more humble and realistic about the validity of what they do" (9). Similarly, an analysis of Stephen Cole and his colleagues concluded that:
The peer review process is also error-prone because the creative thinking which one is trying to assess tends to become less communicable as it becomes more creative. The less obvious an idea is, the more difficult it is to communicate. Something which is readily perceived by a group of peers may sometimes be the result of a brilliant insight, but more often it will represent a more modest advance which will readily be assimilated into existing knowledge. Peer review is like a race where the real leaders are invisible to the judges. Stories of the fallibility of peer review abound (14). David Prescott has recently related how sceptical reviewers were of his claim in the early 1970s to have discovered a novel form of DNA. This led to outright rejection of his grant application (15). Most immunologists are now familiar with the " two signal" concept and the role of "positive selection" in the education of lymphocytes. Yet it would have been professional suicide to have proposed experiments to test these ideas when they were introduced in the 60s and 70s (16,17). Another error in conception is the notion that it is valid to draw a parallel between the creativity of an entrepreneur in the world of finance and that of a biomedical researcher. The case against this has been argued elsewhere (4; see above). If an evaluation process is error-prone it does not follow that evaluation is impossible. It simply means that one has to design the system taking error-proneness into account. This is what the designers of the peer-review system failed to do. The two golden rules of decision-making in uncertain environments are:
A design based on these principles, named bicameral review, has been presented elsewhere (18,19). Grant applications are divided into a major retrospective part and a minor prospective part, which are routed separately. The retrospective part (track record), is subjected to peer review. The prospective part (proposed work) is subjected to in house review by the agency, solely with respect to budget justification. Funding is allocated on a sliding scale. Although bicameral review is much less revolutionary than the bold stroke from the brain to the larynx of our prototypic giraffe, it does offer an alternative to a status quo which is becoming increasingly unacceptable. REFERENCES 1. Chubin, D. F. and Hackett, E. J. (1990). Peerless Science: Peer Review and US Science Policy. State University of New York Press, Albany. 2. Harden, V. A. (1986). Inventing the NIH. John Hopkins University Press, Baltimore.3. Strickland, S. P. (1988) An interview with Kenneth Endicott. FASEB.J. 2, 2439-24444. Forsdyke, D. R. (1989) A systems analyst asks about AIDS research funding. Lancet 2, 1382-1384.5. Apirion, D. (1979) Research funding and the peer review system. Federation Proc. 38, 2649-26506. Mandel, H. G. (1983) Funding more NIH grants. Science 221, 338-340.7. Forsdyke, D. R. (1983) Canadian medical research strategy for the 80s. I. Damage limitation of superelitism? Medical Hypothesis 11, 141-1568. Kirschstein, R. L. et al., (1976) Grants Peer Review: Report to the Director, NIH. Phase I. NIH, Washington9. Osmond, D. (1983) Malice's wonderland. Research funding and peer review. J. Neurobiol. 14, 95-11210. Lederberg, J. (1989) Does scientific progress come from projects or people? Current Contents, Life Sciences 32, No.48, 5-12. Click Here11. Sharp, P. A. (1990) The crisis in funding: a time for decision. Cell 62, 839-840.12. Angier, N. (1988) Introduction. In Natural Obsessions: The Search for the Oncogene, pp. 1-4. Houghton-Mifflin, Boston.13. Cole, S., Cole, J. & Simon, G. (1981) Chance and consensus in peer review. Science 214, 881-88614. Garfield, E. (1987) Refereeing and peer review. Part 3. How the peer review of research grant proposals works and what scientists say about it. Current Contents, Life Sciences 30, No. 4, 3-815. Prescott, D. M. (1992) Cutting, splicing, reordering and elimination of DNA sequences in hypotrichous ciliates. Bioessays 14, 317-324.16. Forsdyke, D. R. (1968) The liquid scintillation counter as an analogy for the distinction between self and not-self in immunological systems. Lancet 1, 281-28317. Forsdyke, D. R. (1975) Further implications of a theory of immunity. J. Theoret. Biol. 52, 187-198.18. Forsdyke, D. R. (1991) Bicameral grant review: an alternative to conventional peer review. FASEB J. 5, 2312-2314.19. Forsdyke, D. R. (1993) Bicameral grant review: how a systems analyst with AIDS would reform research funding. Accountability in Research 2, 237-241. |
ACCOUNTABILITY IN RESEARCH (1994) 3, 269-274.
(With copyright permission from Gordon & Breach Publishers.)
A Theoretical Basis for Accepting Undergraduate Academic Record as a Predictor of Subsequent Success in a Research Career.
Implications for the Validity of Peer Review
By Donald R. Forsdyke
Introduction
Three assumptions
Different
evaluation systems generate different rank orders
Assumption
that peer-review predicts research ability
Peer review as a
predictor of worldly success
God
sets the research "examination paper", but God does not evaluate
An experiment to decide
Three Panglossian replies
Nothing but a pack of cards?
Summary (English)
Summary (French)
End note
Introduction
An examination allows a teacher to rank the members of a class according to the mark they obtain. Teachers generally observe that rank-orders follow a consistent pattern. Those who do well in one examination in a given subject, usually do well in later examinations in the same subject. This gives confidence that the human qualities being measured are relatively invariant. The reader is asked to accept this relatively uncontroversial assumption and two more, which I introduce in the form of a hypothetical experiment. A teacher divides a class into two groups on a random basis. Members of one group are told that their answers to examination questions will be evaluated by the teacher. Members of the other group are told that their answers will be evaluated by other class members. The teacher then teaches the course and sets the examination. (In a repeat experiment the two groups would be switched round.) I propose that the rank order derived from teacher evaluation will usually differ significantly from the rank order derived from evaluation by other class members. This is the second assumption. It follows that, either quantitatively or qualitatively, the human qualities being measured are not the same in the two cases. Some decades later the teacher compares the two methods of evaluation as predictors of subsequent worldly "success". It is found that the teacher's rating is a much less reliable predictor that the rating of fellow students. This is the third assumption. I am not sufficiently familiar with the education literature to know whether anything like this experiment has ever been carried out. For present purposes the reader is asked to agree that the proposed outcomes of the teacher's experiment are not improbable. Different evaluation systems generate different rank orders The results of O'Brecht and his colleagues are quite predictable based on assumptions 1 and 2. Different evaluation systems, applied either contemporaneously or at different times, tend to generate different rank orders among a given group of individuals. Thus, there will be a discordance between the results of one system of evaluation (teacher evaluation) applied at an early time point, and the results of a different system of evaluation (peer review) applied at a later time point. If evaluation by fellow students (peer review) could have been carried out at the earlier time-point, it is equally predictable that there would now be a good correlation with current peer review (Fig. 1).
Those who succeed when evaluated by peer review are funded. They are then more able to pursue productive research careers than those who are not successful when evaluated by peer review. Thus there is a discordance between early teacher evaluation success (which does not correlate with later peer review success), and subsequent research success (1, 2). A peer review judgement is like a self-fulfilling prophesy. This has been referred to this as the "Matthew effect" (3). Certainly the possession of adequate research funds ("means") does not guarantee research success ("ends"), but a lack of research funds tends to destroy morale, break up research teams and waste researchers' time writing more grant applications (4). For their purposes, O'Brecht and his colleagues define research success in terms of the funds and resources a researcher comes to command, and the number of his/her research publications. Assumption that peer-review predicts research ability The real issue is whether peer-review is a better predictor of research ability than teacher assessment? Because teacher assessment is not feasible at later time-points, it does not follow that we should discard the results obtained at an earlier time-point. Recent dramatic advances in biomedical knowledge have been hailed as supporting the essential soundness of the peer review process. Indeed, the words "peer review" and "excellence" have been used together so frequently that one has come to imply the other (5). In such circumstances, it is easy to fall into the trap, as O'Brecht and his colleagues may have done, of believing that success, defined by peer-review evaluation, is equivalent to the optimization of the rate of research progress. From this assumption they draw the general conclusion that research agencies should place less weight on undergraduate academic performance, relative to other indices of merit (1, 2). If the assumption is wrong, then the research agencies have a very serious problem. This would be compounded by placing less weight on undergraduate academic performance. Certainly the assumption would seem to be "politically correct", both at the grass-roots and at higher levels. Those who were not at the top of the class, a category which includes many of us, are prone to spurn the "nerds", "egg heads" and "teacher's pets" who did so well. Yes, they may have been good at "regurgitating" the facts that the teacher had given them, but were they really creative? Peer review, as currently practiced, has been in operation for several decades and success at the peer-review gate has been a major factor dictating success at research. This, in turn, has been a major factor in determining the award of tenure and hence the characteristics of the current professoriat. Evidence that peer review was flawed could topple the entire academic house of cards (6). Peer review as a predictor of worldly success What human qualities are being evaluated by the teacher? What human qualities are being evaluated by fellow students? Why should evaluation by one's fellow students (peers), provide a more accurate predictor of worldly success (assumption 3 above)? Fallible and biased as he/she may be, the teacher is the nearest one can find to a God-like entity who knows the "truth" about the subject of the examination. The task of satisfying the teacher usually requires that one knows the subject better than one knows the teacher. On the other hand, the task of satisfying ones fellow students requires that one knows them better than, or at least as well as, one knows the subject. Success requires that one tunes in to the average perception of one's fellow students of the "truth" of the subject. In short, one has to be political. Out in the "real world", there are few, if any, all-wise teachers who know the "truth". For success in many endeavours, knowledge of the "truth" is not of major relevance. Thus a strength in those human qualities which enable one to tune in to the perceptions of other human beings is a major asset. Student evaluation of each other could well provide a more reliable predictor of worldly success than teacher evaluation. God sets the research "examination paper", but God does not evaluate We want our medical researchers to discover new truths about biomedical processes in order to optimize the rate of growth in our understanding of diseases and of their prevention and treatment. This is the major goal of the agencies which fund medical research. When a medical researcher discovers a new "truth", how can we evaluate it? We are concerned with the truth, not the politics, of some tragic human disease such as leukaemia, heart disease or AIDS. We need an all-wise teacher who can carry out the evaluation for us. Alas, we are no longer at school. Introducing a metaphor may help. We can argue that "God", when creating living organisms, created the "examination paper" which future students (medical researchers) would have to tackle. And to mark the paper, God-like intelligence is needed. However, apparently, God is not around to help. A researcher's ideas and research results are evaluated by his/her peers. The purpose of peer review should be to approach as close as possible to the sort of evaluation an all-wise teacher, or a God, would make. Because peers are not all-wise, peer review does not achieve this goal. If we had to design an experiment to determine how close peer review comes to the goal, we might end up with a protocol remarkably similar to that used by O'Brecht and his colleagues (1). We might start with the assumption that the process of discovering, with the help of our teacher, knowledge which will help us pass an examination, requires human qualities not so different from those required for the later process of discovering professionally, without the help of a teacher, new knowledge which will advance the understanding of disease. Such qualities include motivation, judgement, industry, persistence, the ability to arrange information in a network of associations in the mind ("memory"), and the ability to manipulate that information to solve problems ("intelligence"). It seems likely a priori that there would be a good correlation between undergraduate academic achievement and subsequent research performance. Failure to find this, as reported by O'Brecht and colleagues (1, 2), could then be interpreted as showing that peer review is a poor way of selecting researchers of high ability. It would follow that, exhilarating as some of the new advances in medical research are, actually the medical research enterprise is advancing at a rate much less than the theoretical optimum (4, 5, 7).
Thus, the results of the study of O'Brecht and his colleagues (1, 2) are susceptible to two mutually exclusive explanations. Either early teacher evaluation provides a more reliable predictor of research potential than late peer review evaluation, or late peer review evaluation provides a more reliable predictor of research potential than early teacher evaluation. The assumptions supporting the latter explanation appear flawed. The assumptions supporting the former appear reasonable. The academic superstructures of our universities and research institutes, built up through decades of peer review, may be "nothing but a pack of cards" (6). In this circumstance, placing less weight on undergraduate academic record would compound the problem. A major goal of a research granting agency is to select those researchers who will best advance its mission. Noting a poor correlation between undergraduate academic record and subsequent success in medical research, agency officials have argued that less reliance should be placed on academic record. However, from the same data, a contrary conclusion can be drawn. Consider the existence of two evaluation systems, teacher review, and peer review. We assume that at the time of undergraduate education, teacher review of individuals on repeated occasions would generate a consistent rank order. Similarly, peer review of the same group of individuals would generate a consistent rank order, which would usually be different from the teacher-generated rank order. At a later point in time, teacher review is not feasible. Grants are then awarded based on peer review. Thus, individuals who score highly under peer review prosper. Since rank orders obtained on different occasions using different evaluation systems may not be well correlated, it is not surprising that there is a poor correlation between early success (assessed by teacher review) and later success (assessed by peer review). However, this disregards the real issue. Which evaluation system is the best predictor of the ability to advance agency goals? Because teacher review is not feasible at the later time-point, it does not follow that we should discard the results obtained earlier. Rather than questioning the reliability of early teacher review, agency officials should be questioning the reliability of later peer review.
Un but majeur d'un organisme subventionnaire est de selectionner les chercheurs qui avanceront le mieux sa mission. Ayant observe une correlation pauvre entre les notes de premier cycle et la productivite subsequente en recherche, les responsables d'un organisme subventionnaire ont raisonne qu'on devrait accorder moins de poids aux notes de premier cycle. Toutefois, on peut tirer une conclusion contraire de la meme information. Supposons deux systemes d'evaluation, A (evaluation par professeur), et B (examen par les pairs). On suppose ici que pendent les etudes de premier cycle, l'application de A a un groupe d'individus de temps en temps permettrait de les ranger sensiblement de la meme facon d'un essaia l'autre. De la meme facon, l'application de B au meme groupe d'individus engendrerait un serie de rangs, qui seraient en general differents de ceux engendres par A. Or, plus tard, A n'est pas faisable. Dans ce cas, les subventions sont attribuees selon B. Ainsi, les individus qui obtiennent de bons resultats selon B sont avantages. Je propose que, puisque les rangs obtenus aux occasions differentes selon des systemes d'evaluation differents ne seraientpas bien correles, il n'est pas surprenant qu'il n'y a qu'une correlation pauvre entre le succes precoce selon A et le succes tardif selon B. Toutefois, cela passe a cote de la vraie question.Quel systeme d'evaluation donne les meilleures predictions de la capacite d'avancer les buts de l'organisme? Meme si A n'est pas faisable plus tard, il ne s'ensuit pas que nous devrions abandonner les resultats obtenus plus tot. Au lieu d'interroger la fiabilite de A applique de maniere precoce, les responsables des organismes subventionnaires devraient s'interroger sur la fiabilite de B applique plus tard. REFERENCES Forsdyke, D. R. (1983). Canadian medical research strategy for the 80s. Medical Hypothesis, 11, (2) 141-156.Forsdyke, D. R. (1989). A systems analyst asks about AIDS research funding. Lancet, 2, (8676) 1382-1384.Forsdyke, D. R. (1991). Bicameral grant review: an alternative to conventional peer review. FASEB J. 5, 2312-2314.Forsdyke, D. R. (1993). Bicameral grant review: how a systems analyst with AIDS would reform research funding. Accountability in Research, 2, 237-241.Forsdyke, D. R. (1993). On giraffes and peer review. FASEB.J. 7, 619-621Forsdyke, D. R. (1993). Predicting adult success. The Scientist. (Click Here)Merton, R. K. (1973). The Sociology of Science. University of Chicago Press.O'Brecht, M., Pihl, R. O., & Bois, P. (1989). Criteria for granting training awards to graduate students. Research in Higher Education 30, (6) 647-664.O'Brecht, M., and Pihl, R. O. (1991). Granting agency criteria for awarding graduate research scholarships. Canadian J. Higher Education 21, (3) 47-58. [THE LATTER TWO JOURNALS DECLINED TO PUBLISH THIS CRITIQUE OF THE RESEARCH WHICH THEY HAD PUBLISHED]Osmond, D. A. (1983). Malice's wonderland. Research funding and peer review. J. Neurobiology 14, (2) 95-112. |
Click to go to PEER REVIEW index
This page was last edited 08 Sep 2016 by Donald Forsdyke