The burden of triage should not fall unfairly upon the ESI/NI

November 1, 2012

I have heard a very dispiriting rumour floating about and I raise it on the blog to see if any of my Readers have seen similar things happening.

Once upon a time, the NIH came to the realization that the peer review process for grant applications had a bias against the less-established, newer, younger, etc Principal Investigator. That is, their proposals did not score as well and were not being funded at the same rate as those applications on which more senior and established investigators were the PI.

Someone clearly came to the conclusion, which I share, that this difference was not due to any meaningful difference in the chance that the ensuing science would be valuable and productive. So the NIH set about a number of steps to redress the situation.

One of the pathetic bandaid solutions steps they came up with was to ensure that the burden of triage did not fall disproportionally upon the younger PI applications.

As you are aware, approximately half of applications do not get discussed at the meeting. This is based on preliminary scores issued by the three assigned reviewers, generated prior to the actual meeting date. Without being fully considered by the entire committee during discussion, an application cannot be “rescued” from various sources of unfair or bad review. Although we all recognized that a full rescue from nearly-triaged to a fundable score is rare it is at least possible. And since the NIH is really just looking at aggregate scores when it comes to the bias-against-noobPI-apps stuff, the movement in the positive direction is still a desired goal.

What someone in the halls of the Center for Scientific Review at the NIH realized was that if noobPI apps were generally scoring worse than those of established PIs, then they were more likely to be triaged. So if the general triage line was 50% of applications, then perhaps 75% of Early Stage Investigator or New Investigator apps were being triaged.

The solution was to put down an edict to the SROs that “the burden of triage should not fall unfairly upon the ESI/NI applications”. Meaning that when the triage lines were originally drawn based on the preliminary scores, the SRO had to specifically review the population of ESI/NI apps and make sure that an equal proportion of them were being discussed, let’s say 50% for convenience. This meant that sometimes ESI apps were being dragged up for discussion with preliminary scores that were worse than scores of several apps from established PIs which were being triaged/not discussed.

You will anticipate my skepticism. At the time, and this was years ago by now, I thought it was a ridiculous and useless move. Because once the preliminary scores were in that range, they were very unlikely to move. And it did nothing to address the presumed bias that put those ESI/NI scores so much lower, on average, then they should have been. It was a silly dodge to keep the aggregate numbers up without doing anything about the fundamental outcome- fundable? or not-fundable?

HOWEVER.

The rumour I have heard is that some SROs have been interpreting this rule to mean that only 50% of ESI/NI apps should be discussed. A critical distinction from “at least“, which was my prior understanding of the policy and certainly how it was used in any study sections I participated in. In such a new interpretation of the policy, there would potentially be established investigator applications being discussed which had preliminary scores worse than some ESI applications that were not being discussed.

No Responses Yet to “The burden of triage should not fall unfairly upon the ESI/NI”

This is absolutely correct. The current procedure (in the study sections I’ve been in recently – I had assumed it was policy) is that the ESI/NI grants are reviewed first, compete only with each other, and are 50% triaged, and then the non-ESI/NI [i.e. everyone else] are reviewed second, compete only with each other, and are 50% triaged.

I still say no one should EVER be triaged. It completely undercuts the purpose of study section, which is to evaluate the science and not to make funding decisions. (Since a triaged grant cannot be funded by definition since it has no score. Aren’t we always told that we’re “just evaluating the science” and “not making funding decisions”?) Additionally, triage makes the “distribution of scores” less measurable (since there’s a set of unscored proposals sitting out). Most importantly, it means that it is even harder to get a sense of realistic score validity.

When I was on an NRSA study section years ago, we went two days, and didn’t triage anyone. It wasn’t that bad, and the study section felt more fair.

This all being said, the much worse situation is that only three people read each proposal, and one bad score is enough to triage a proposal. Three is too small a sample.

I have ESI status, and after talking to the SRO in the study section I usually submit to this is exactly how they were doing it. It in theory gives ESI/NI investigators a better chance of being discussed/funded, but looking at reporter most of the new R01 grants in that study section are still going to the BSD types.

God, yeh. I feel exactly the same fucking way. So depressing as a youngster trying to make it up the ladder. It is hard to have any hope whatsoever. And “The Admin” just keep playing hardball and are kicking people out left, right and center. It is not like there is any good news on the horizon either: interim 2013 paylines are starting to look utterly ridiculous (6th percentile for established PIs/10th percentile for new/ESI PIs for R01s at NIAID).

Obviously almost never… that’s the problem they’re pretending to fix, because trying to address the overt bias against ESI’s by reviewers would be hard/pointless.

“one bad score is enough to triage a proposal”

Yeah, my NRSA got triaged because one reviewer thought my subfield (of about ~1,500 papers/year and ~600 NIH research grants) literally should not exist, or at least should not be funded by the NIH. Literally: irrelevant to the mission of the NIH. S/he then proceeded to tank scores in every category for this reason, without any reference to the criteria for each category or to anything in the proposal.

Amongst other things, one major reason for doing triplicates (or more) in an experiment or assay is to allow one to identify outliers more easily and confidently. Why not discard the score of the most extreme reviewer during NIH grants reviews?

They do this a lot in sports where highly subjective scoring systems are in place (think gymnastics, diving etc).

Why not discard the score of the most extreme reviewer during NIH grants reviews?

In my experience, when there is a 2/1 split of unusual distance in the preliminary scores, these applications get the greatest focus from the panel. Disparate scores have an alerting function. I think this was probably the most common reason for someone calling a grant off the triage pile that I ever saw.

That would’ve been nice. Did I mention that although I’m in a reputable research university he gave a shitty environment score? Because he claimed my PI was “inexperienced”? Maybe that should have been a red flag? The other reviews weren’t fawning, but were not, I think, triage-worthy. Anyway, I was in the wrong section, probably the wrong institute. Agh. I haven’t thought about this or been mad about it in a long time. Pointless, given that grant review is what it is.

I’ve come down with a case of academic nihilism. Grants, papers, job applications… we delude ourselves that we can game these processes, they delude themselves that they have a rational process and enough information to make decisions that are better than chance at meeting their goals.

I think it would not be unreasonable to score all the ESI/NI proposals. Those people need the feedback from the full review, and they don’t need the depressing hit of being not scored.

However, I can’t agree with Qaz’s idea of scoring everyone “…we went two days, and didn’t triage anyone. It wasn’t that bad, and the study section felt more fair.”
When I was recently on a panel every application that got a score likely to get funded was done in the first 2 hours. The rest of that day and half of the next we were arguing about applications with no hope. And since they can’t give us cookies and coffee anymore, we were all sagging and ornery. It felt miserable and very pointless.

…….or you could just ask more reviewers to preliminarily review each grant (perhaps 5), discard the two outliers and keep the triage system the same for everyone. I take DMs point that when there is significant disagreement there is more detailed discussion, but we all know that study sections can be dominated by a few so I’m not sure how much discussion really changes anything. Actually, I remember seeing data on this but I can’t be arsed to find it now.

“And since they can’t give us cookies and coffee anymore, we were all sagging and ornery. It felt miserable and very pointless.”

Well, yes, I would never do it without cookies and coffee. In fact, Ive been debating whether to suggest a reviewer’s strike for the cookies and coffee. Maybe DM can start it?

I would ask Joe, however, what the point of review is. Why do we write reviews and send them back? Why don’t we just give scores? I think the point of review is to help improve the science. (This is the same reason we review papers. If all we were doing was “gate-keeping” then we could just tell the editor whether we felt it should be published or not.) In my opinion, the point of review is two-fold. First, to guide us to give good reviews. (There’s a very nice discussion of why guided questions help subjective scores be more reliable in Kahnemann’s Thinking Fast and Slow.) But second, to guide the other scientists to do better science. [Yes, I have seen several cases where the reviews have changed how I do my own science, both in terms of scholarship and actual experimental design.] As such, part of being a reviewer is not just to discuss “fundable grants” but also to provide feedback to help improve both the science and the proposal.

Those people need the feedback from the full review, and they don’t need the depressing hit of being not scored.

With the exception of the final impact score, it’s my understanding that everyone gets the same amount of information in their summary statements, regardless of whether or not their grant was triaged. My triaged K99’s SS had a full set of reviews, just as much as my funded R21.

Also, I’d disagree with the comment re: being depressed about not being scored. Knowing I’m in the bottom half is enough–if the grant was that bad, I really don’t need to know that I was actually in the bottom 10%.

“Why do we write reviews and send them back? Why don’t we just give scores?”
I think we review grants to help the NIH identify the best projects to fund, both now (we score those well) and in the future (we score those pretty well and make suggestions for changing them). I think the ‘educating the applicant’ function is secondary. It is helpful for NIs/ESIs, but a don’t think a lot is gained from the discussion of applications that are nowhere near the cut-off. These applicants do get reviews, they just don’t get a score or a summary of the discussion (since the discussion doesn’t happen).

“Well, yes, I would never do it without cookies and coffee.” This is some kind of bureaucracy problem. They pay me $800 to come to the meeting, $500 to fly me there, hundreds more for my hotel room, but they are not allowed to buy coffee for the group to keep us awake for 8am to 6pm? This is the height of idiocy.

Dave, during ARRA or thereabouts they piloted journal style review where the initial reviewers sent comments to the actual study section members who served as a sort of AE board. How would you like that approach?

qaz, you are very seriously doing it wrong. Study sections are supposed to advise on the merits of the application in front of them. Your paternalistic approach is wrong and is at the root of the queuing problem and the Noonan problem. Knock it off!!!

qaz isn’t entirely wrong, a reviewer needs to justify why she/he scores an application and needs to do that in a constructive manner. Peer review is also peer review of reviewers and that’s a major advantage of face-to-face meetings where poor/unprepared/inconsistent reviewers can be called out. The biggest threat to peer review isn’t paternalism but crappy reviewers. The agencies are struggling to get people to review in the first place and this inevitably means some reviewers are recruited who are incompetent.

Of course, the primary purpose is to assess merit of each application but this has to be done in a constructive manner (unless the application is so obviously flawed that it should not have been submitted in the first place).

On triage, I think its a reasonable approach IF it allows more time to focus on the applications that have a statistical probability of funding. In a practical sense, panels (and chairs) assess the “load” and will devote less time if there is no triage. This could be countered by reviewers keeping the discussion of outstanding and poor grants both short. There is no need to kill a grant by pointing out the 20th flaw nor is their much point in prolonging discussion of a clearly excellent application. This is the role of an effective chair. Panels do not have an obligation to spend equal time on applications.

As an aside, there is another form of triage that NIH is using. I participated in a program grant meeting (teleconf with a limited time window) and we were told to discuss in the order of lowest (best) average preliminary score to worst. After about ~40% had been discussed and scored, the rest were unscored due to insufficient funds available. There was discomfort but over the phone, it simply wasn’t possible to argue effectively.

DM writes: “qaz, you are very seriously doing it wrong. Study sections are supposed to advise on the merits of the application in front of them. Your paternalistic approach is wrong and is at the root of the queuing problem and the Noonan problem. Knock it off!!!”

Not at all. The grants that I review are reviewed in terms of the quality in front of me. I can absolutely guarantee that the grants I review are no more likely to end up in a queue than others. (Yes, I compare my scores and reviews to the others on the study sections I am on to calibrate myself. I’m very comfortable that I am not causing queuing.) Remember that saying a grant is “not ready” and providing suggestions is not the same as queuing. Queuing implies that the grant is ready, but needs to wait its turn. That’s a very different phenomenon that I do not condone or tolerate.

Remember that we are supposed to review the grant in front of us. If the question is “Should we give this person money to do some science?” then I don’t need the grant, I just need the person’s CV. But the question is “Is this proposed project a good one? Is it logically flawed? Do they have the right controls? Is this the best way to get impact? Is this ready to be funded?” As such, I review the grant, give it a score, and make suggestions. If the score is good enough, the grant will get funded. If not, then maybe my suggestions will help them improve it to get into the fundable range next time.

This has nothing to do with paternalism. It has to do with peer review. I am not so arrogant as to believe that because I am ready to submit a grant (or a paper) on a topic, I must be the only one with good ideas. I know that my peers have insights, background, and ideas that I don’t and I welcome their input. (Many of my most impactful papers have changed dramatically because of comments and insights by reviewers. I only hope that my reviews are as helpful to my colleagues.)

Paternalism assumes that I’m putting myself above my colleagues. Peer review assumes that we can all judge each other. Providing suggestions assumes that we all come at this with different toolboxes, and that other ideas can be helpful.

I can absolutely guarantee that the grants I review are no more likely to end up in a queue than others.

I can absolutely guarantee you are wrong if you are doing what you seem to be saying you are doing.

But the question is “Is this proposed project a good one? Is it logically flawed? Do they have the right controls? Is this the best way to get impact? Is this ready to be funded?”

Of course you aren’t supposed to be answering the 5th one there 🙂

But your “logically flawed” and “right controls”, and to some extent “get impact” are fraught with exactly the problem I see with your approach. Done *sparingly* you have a point.

But I continue to argue that the vast majority of reviewer complaints that predict impact are trying to predict an unknowable future based on that reviewers beliefs. When empirical conduct of the work is the only way to know. So the best way is to give them the award and let them get to bloody work.

Controls? please. If they are really the right controls the PI will think them up and if not, the manuscript reviewers will insist. And if not and the work gets into good journals anyway, perhaps you were wrong. So insisting that the exact perfect control that *you* dreamed up has to be in the proposal document is very often a huge driver of the queuing. or, the excuse anyway.

“logically flawed”. hmmmmm. yeah this is delving into the kind of arguments one gets into with reviewers over the Discussion section. So here you are, being able to prevent someone from doing the work *at all* until they write a pre-Discussion that fits your “logic”. Or perhaps you wish to push a project in a direction that you favor but they’ve proposed to push it in another direction. Your *preference* is a bad influence and contributes to the inherent conservatism of science. You are venturing towards the type of “these experiments have to be done in exactly this way with exactly these controls under these MF buffer conditions” bullshittio that is absolutely the biggest large-scale problem at the NIH.

The benign version of the NIH pushing for prioritizing “impact” and “significance” is PRECISELY to get study sections to stop with this crap. And to focus their minds on the underlying bones of the proposal. Is it a good idea? Can one reasonably expect that this highly trained and/or accomplished scientist can work this shit out as she goes along? (if your answer is no to the latter, you should be dissing the Investigator score because you hate all of her papers. If you like those papers okay, then stfu about any lack in the present proposal).

Qaz’s response is a large part of why they got rid of A2s. Specifically, it is not the job of the SS to guide the research, by iteratively modifying the science. SSs should not make suggestions as to what might make a project better, it’s the PI’s job to figure that out based on the critiques. SS should not be writing people’s proposals for them. The removal of A2s got rid of the rebuttals along the lines of “I did what you asked now gimme the money”

Good ideas with potential impact are a dime a dozen. Turning a good idea into a real scientific result is difficult, time-consuming, often expensive, and (most-importantly) fundamentally dependent on knowing what the alternative hypotheses and controls are.

Yes, I agree completely that the ultimate goal is “Can one reasonably expect that this highly trained and/or accomplished scientist can work this shit out as she goes along?” But at that point, all we need is the name of the scientist (or the CV if you want to prevent the old-boy/girl network that we actually live with) and the one sentence (or one-paragraph) title. In fact, if we were going to take this to its logical conclusion, then what we should be doing is giving each PI enough money to do what they think is the next cool project, we shouldn’t be reviewing grants at all, and funding should be completely dependent on whether the PI has enough money to fill out their lab and continue doing reasonable work. (This is similar to what I have argued we should do – each PI gets one R01-level steady funding – once a PI has “entered into the system”, they get an “easy renewal” as long as they’ve been doing good work with the last cycle. Yes, there are issues about how one enters the system, what happens when a renewal fails, how to handle PIs who want larger labs, etc. But that’s for a different place and time.) My point here is that this is NOT what we have been instructed to do as reviewers on study section. Your idea of “trusting the PI to figure this [stuff] out throughout the project” is what creates the advantage of BigNameFamousScientists over all those ESI/NI kids who don’t have track records yet.

We can all say that what we really want is to give each trusted PI the money to do the f—ing work, but that is explicitly NOT what we have been instructed to do on study section. In EVERY study section I have been on, we have been explicitly instructed to review the grant in front of us, to determine if it is viable, if it is well-designed, and if it is going to have high impact.

And DM, the time for controls is BEFORE submitting to the journal – in fact, the time for controls is DURING the experimental design so that you have appropriate controls during the experiment. Controls done afterwards in response to journal reviewers is trying to play catch-up for a poorly designed experiment. [I will note that people complain bitterly about being told to do additional experiments as controls for papers as well.]

I stand by my statement. I score the grant in front of me. And I provide suggestions to improve the score next time (if there has to be a next time).

And Dave, identifying the controls shouldn’t take 1-2 pages. But a proposal without the necessary controls has been dead in any study section review I’ve ever seen.

Your idea of “trusting the PI to figure this [stuff] out throughout the project” is what creates the advantage of BigNameFamousScientists over all those ESI/NI kids who don’t have track records yet.

The record would show that I was as enthusiastic, perhaps even more so, about this line of argument with respect to noob investigators as I was about established investigators when I was in the trenches.

time for controls is BEFORE submitting to the journal – in fact, the time for controls is DURING the experimental design so that you have appropriate controls during the experiment.

depends on the experiment of course. but sure, so expand your thinking to include the punishment/learning factor that makes them do better for the next paper. Again, we apparently disagree about the relative smarts of the average PI and the degree to which a random reviewer is going to edumacate them into doing it “right”. Like they never would have thought of it.

among other concerns, the grant application is a static document, often written 4 years before a critical experiment. stuff *changes* in that time. more information accumulates. so the marginal impact of a reviewer “catching” some frigging buffer issue (or more likely the failure to specify some frigging buffer issue in the available 12 pages) and sending it back for another round of revision is pretty low.

[…] published in Slate?) The Barry Bonds of storms NYC Rats: Did Subway Vermin Survive Hurricane Sandy? The burden of triage should not fall unfairly upon the ESI/NI (at some point, someone is going to have to force NIH to admit what the hell their career policies […]