I was very upset and frustrated by the current review system ...
My last R21 grant got a score of 182. After revision, the proposal got "unscored" ... I didn't think at the scientific review officer did a good job.
You spent a lot of time in preparing a proposal but who cares....
We really need to revise the review system...

Many applicants believe that their revised application somehow deserves an even better score than the previous version of the application and that if this does not happen this is evidence of a broken review system. That thinking is totally incorrect.

The NIH considers grants for funding on a round-by-round basis. Three times per year they will allocate a portion of their yearly budget for funding new proposals. To decide which ones to fund each round the ICs need to know the relative merit of the proposals that have been submitted for that round of funding.
They do not need to know where the proposal ranks among proposals that they funded during the 80s. They do not need to know where the proposal ranks among proposals that will be considered next round.
They do not need to know how meritorious the proposal is in an abstract scale of grant quality.
They need to know is whether this proposal better or worse than the other ones in the pool that has been submitted for the current round. Where does it rank? Which is why the percentile is so much more important than the priority score when it comes to Program considerations.
Peer reviewers who serve on NIH/CSR study sections are asked to perform the relative ranking of proposals. In fact more than just "asked" to do so. There are a number of boilerplate instructions on the topic. As the BM observed at MWE&G:

the review rules are made quite clear on this. Revised apps are NOT to be benchmarked against prior scores. All apps are to be compared primarily within round

My experience is that SROs will jump all over it if the discussion trends towards "We can't really assign a worse score than the previous version" or anything else that smacks of score-anchoring to prior rounds. (Of course, reviewers do this, they just aren't supposed to. )
So there are a number of quite reasonable scenarios for why your score could get worse, even if your application has been "objectively" improved through revision. First and foremost because there may simply be more applications that are better this round. It doesn't take many to go from scored to triaged, btw. My panel sees maybe 40-60 R01 apps so suppose that 25 are discussed. Obviously for any threshold, someone has to be the worst one discussed but just think about, say, 5 applications shifting one way or the other relative to yours. Not so hard to imagine is it? That from round to round the variance in theoretical objective merit should easily change by a mere 5 proposals?
Second, of course, because there is variance in the within-round review process depending on the specific reviewers assigned to your proposal and the pools of applications that have been assigned to those reviewers. People are loathe to believe that their good score was a result of the other proposals in a given reviewer's pile happening to be unusually bad that round. It happens. So let us think about doing a resampling process on your application. Take three different sets of reviewers, four or five different plausible sets of grant assignments for each of those reviewers and I guarantee you would see substantial variation in the outcome for a given app. And not variation from a 180 to a 183, neither. So how does the applicant know that the score they received is not simply on the extreme good side of the distribution of likely scores they could have received? They do not.
I beg of you applicants who have not sat on study section yet. Think about what you are up against and get realistic about the process. you are not special. You are not smarter than everyone else. You do not write better grants. You are not more deserving. People are not out to get you specifically.
It is going to be helpful for your mental health and blood pressure if you stop viewing every disappointing grant score as a personal attack and evidence that the system is irretrievably broken.

Logically, if it is possible that last round you just got especially generous reviewers, it is also possible that this round you just got especially idiotically evil reviewers. Which would, to the victim of said reviewers, certainly feel like a broken system.
I recently decided that being a sane academic scientist requires a certain level of self-delusion. Not about data, but about grants. You have to simultaneously believe that 1) Your funded grants are funded because you are so awesome! and 2) Your unfunded grants were unfunded because of reviewer ineptitude and/or absurdedly low funding levels and/or anything concrete that will be changed next time around.
More practically though, I was wondering... how much do scores tend to vary from batch to batch?

The NIH considers grants for funding on a round-by-round basis. Three times per year they will allocate a portion of their yearly budget for funding new proposals. To decide which ones to fund each round the ICs need to know the relative merit of the proposals that have been submitted for that round of funding.

This is completely wrong. ICs try to equalize things over the course of the fiscal year, being somewhat conservative in the beginning of the year with the payline, and then going back and retroactively digging deeper if necessary as the year proceeds. This is so that they are not funding shittier grants in the early rounds than ones that don't get funded later, such as if they have less money left at the end than they expected. Also, priority scores are percentiled against those grants reviewed in the current round *and* several rounds prior (I think two or three).
You are correct, however, with your broader point that just because you were in the 15%ile two years ago doesn't mean that being in the 20%ile today means that your grant "got worse".

This is completely wrong. ICs try to equalize things over the course of the fiscal year, being somewhat conservative in the beginning of the year with the payline, and then going back and retroactively digging deeper if necessary as the year proceeds. This is so that they are not funding shittier grants in the early rounds than ones that don't get funded later, such as if they have less money left at the end than they expected.
No, I am completely right! True, this is what the ICs actually do when it comes to making their decisions. But they most emphatically do not want reviewers messing in this behavior. They want reviewers to do things straight up, round by round, consistent with my overall point here.priority scores are percentiled against those grants reviewed in the current round *and* several rounds prior (I think two or three).
current and two prior making for a complete rolling year. did I say something else? perhaps I gave an inaccurate impression. Again, the point is that even if the CSR uses this smoothing procedure to decrease variance, they are not asking reviewers to do this for them! The are asking reviewers to concentrate on the relative ranks within round and within one's own pile. This has been my experience. Has anyone else been instructed differently? Sure, there's that little reference sheet for what scores are supposed to mean subjectively but that's crap. (or perhaps just my SROs have been out there on this. )

DM - Thank you for the continued conversations on study sections. I am very intrigued by your descriptions because they simply do not match my experiences on study sections. (Which are admittedly, clearly much less than yours.)
In study sections, I have definitely heard "This got a score of XYZ last time, why is it getting worse?" Sometimes, I've heard justification, sometimes not. No one has complained or balked at this line of reasoning.
Also, before each study section, we've gotten the previous distributions of scores handed out. Usually with some frustrated comment about score compression - "they can't all be 1.6's!"
While I realize that the goal may relative score, it can't be relative to the current round alone since percentile scores are based on a comparison with the last two study section meetings. This means that there has to be some consistency between scorings between meetings (at least across a 1-2 year timescale).
The differences between the study sections we're seeing make me REALLY nervous about a lack of consistency between study sections. Since institutes work across study sections rather than matching up one-to-one, I wonder if this would produce a bias towards certain study sections (or worse, certain fields, perhaps even within study section).

In study sections, I have definitely heard "This got a score of XYZ last time, why is it getting worse?" Sometimes, I've heard justification, sometimes not. No one has complained or balked at this line of reasoning.
I hear this too. The difference is that the SRO tends to chime in. Of course, the SRO commenting does not necessarily do anything about reviewer behavior.
Remember that the original motivating comment was that if the score changed in the bad direction, this meant that the SRO was doing a bad job. I'm just pointing out that this may happen precisely because the SRO is doing a good job. And that the expectation of score anchoring is perhaps actually the evidence of an inattentive SRO (not definitive evidence because all the SRO can do is gently suggest).

I have actually had the SRA state that a grant on the cusp of funding in the previous submission could not be triaged in the current study section. Basically, I think the SRA was thinking more about the summary statement and how to write it a grant was now triaged than actually caring about the process.
While I have been mostly happy with the study sections I have served on, there have been a few horrible examples that justify the "system is broke" mantra. Again by and large I think the process has worked, but the 5% of the cases with an inappropriate statement/critique/analysis is concerning.

Remember that the original motivating comment was that if the score changed in the bad direction, this meant that the SRO was doing a bad job. I'm just pointing out that this may happen precisely because the SRO is doing a good job.
I think that I would have confidence in this reasoning if the application had been scored both times and the score got worse. The scored application reflects the judgment of the entire panel. In contrast, the unscored application represents the judgment of 3 or (maybe) 4 people. I sympathize with the original commenter in this case, since the judgment of the panel was replaced with the judgment of a limited number of reviewers (who may or may not have been part of the original panel, and may or may not have had a better than average stack).My experience is that SROs will jump all over it if the discussion trends towards "We can't really assign a worse score than the previous version" or anything else that smacks of score-anchoring to prior rounds.
In this case the SRO was essentially irrelevant since the application was unscored. If the grant was worth discussing previously, was close-ish to funding (scoring is sufficient evidence of this), and was improved, it should merit discussion on revision. Even if the score may end up being worse (possibly significantly worse).