soj wrote:The point to my long post is that while the test questions may be getting easier or harder, as far as the "objective difficulty" (as judged by LSAC) goes, any variance in difficulty is eliminated by a corresponding adjustment in the raw-to-scaled conversion scale. Of course, individuals might find certain question types tougher than other question types, and if those tougher question types are getting more common, those individuals might find recent tests more difficult, a pattern that conversion scales (which cater to all test takers, not individuals) may not account for.

Your post was perfect and just what this thread needed, nicely done!To answer op I think LR has gotten harder personally. Old pt's had less curveball questions and were more predictable/to the point. Rc has also gotten a bit harderThough I admit all this, I still think it's almost a negligible difference...almost.

So much stupidity and fail here. I no longer have any doubt as to why you rode the WLs for so long despite your numbers. Adcomms obviously paid attention to your PS.

Read the post below yours so you can (possibly) finally acquire a basic understanding of how nonsensical and wrong everything you've written in this thread is.

why are you so mad?equating is using the scores of previous testers (and their relative scores on experimental sections) to determine future scaled scores, or "curves." Not as elegant as soj's response, but still essentially correct.

Whereas less prepared people used to score 170s, the test had to be made harder (not significantly, but somewhat noticeably) so that there wouldn't be an uneven distribution of scores along the percentiles. Also, some of the difference in difficulty between old and new tests is due to people's strengths and weaknesses. It's not as though every 170er would have scored a 172 10 years ago or something.

JamMasterJ wrote:Whereas less prepared people used to score 170s, the test had to be made harder (not significantly, but somewhat noticeably) so that there wouldn't be an uneven distribution of scores along the percentiles.

This isn't right. LSAC wouldn't make a 170 harder to get to adjust for people getting smarter. If they did, they would be giving people an incentive to take the LSAT earlier. More importantly, a 170 in 2000 wouldn't be comparable to a 170 in 2010, which is against the whole point of equating.

Don't think of it as a curve. Think of it as making every test equivalent to each other so that schools can just look at the scaled score and not have to worry about comparing different exams.

Even if June takers are smarter than February takers, it's not harder to score high in June than in February.

Similarly, even if 2012 takers are smarter than 2002 takers, it's not harder to score high in 2012 than in 2002.

1. The LSAT is "curved" in the sense that on harder tests, you can get more questions wrong and still get the same scaled score.

True.

LSAC goes through a bunch of steps to make sure you're not penalized for getting a tougher test or rewarded for getting an easier test. First, LSAC pretests questions using the experimental sections. Ambiguous or poorly worded questions are thrown out. Questions that seem to reward poor reasoning (i.e. questions that low scorers tend to get right more often than high scorers do) are thrown out. Then they combine the remaining questions so that the test has a good mix of hard, medium, and easy questions. But some tests inevitably end up being tougher than others, and so LSAC comes up with a conversion method that takes difficulty into account. If a test has too many easy questions, then it takes more credited responses to get a 140 on that test than on a test with fewer easy questions. If a test has too many hard questions, then it doesn't take as many credited responses to get a 170 on that test than on a test with fewer hard questions.

2. The LSAT is "curved" in the sense that every test is meant to have the same distribution of scores. In other words, if all the smart people took the June test, then the June test will have a tougher curve, and it'll be harder to score higher in June than in other administrations.

False.

The LSAT standardizes scaled scores, not percentile scores. In theory, a person will always get the same score that reflects that person's reasoning abilities. If it weren't for outside factors such as test-day condition, relative strength in a certain question type over others, and random luck, someone with a "170" reasoning ability will get a 170 no matter when that person takes the LSAT. Your score has nothing to do where you stand relative to others taking the same test. You're not competing against others sitting in your testing room. If all the smart people take the test in June, it may be more difficult to get a certain percentile score in June, but no more difficult to get a certain scaled score. If test-takers are getting smarter or more prepared, then more people will achieve high scores. But from the perspective of a single test-taker, no single scaled score will be harder to obtain. The fact that people are better prepared for the LSAT these days is reflected in the fact that a 170 used to be a 98th percentile score but is now a 97th percentile score. But people who scored a 170 in 2011 are on average no better or worse reasoners than people who scored a 170 in 2001.

3. Whether a question is labelled easy, medium, or difficult depends on how well previous test-takers did when that question was used in an experimental section, so the fact that people are getting smarter and more prepared is setting the bar higher.

False.

Everything before "so the fact that" is true. Yes, how well previous test-takers did on individual questions in experimental sections partially determines whether LSAC considers a question easy or difficult, but keep in mind that LSAC has been equating the exams precisely to deal with this problem. Conversion scales are set so that a given scaled score on any exam is equivalent to the same scaled score in the previous generation of exams, which is equivalent to the same score in the generation of exams before them, and so on. As a result, a given scaled score on any exam is equivalent to the same scaled score on all the other exams. If there are more high scorers in a certain administration, that just means more high scorers to use as data points when determining the scale for the next administrations. It does not "raise the standard."

I may be failing to comprehend something here but please, humor me: I see your point that the LSAT is designed to equate in a way that allows for a change in the preparedness of LSAT takers as a group while maintaining consistency in what level of reasoning ability earns what scaled score but I fail to see how this is possible.

I don't doubt that they've attempted this, but if my understanding of how the tests are created and calibrated is correct then I don't see how the LSAC could accomplish that.

The LSAC's source of feedback on question difficulty is test takers, as a group. So, if test takers as a group have gradually gotten better prepared but have stayed the same in reasoning ability (which is what we all suspect?) then test takers as a group have, slowly and gradually, made it appear that questions which formerly required X level of reasoning ability are marginally less difficult, leading test makers to include more difficult questions (unwittingly).

Essentially, what I don't understand, is how test makers could separate reasoning ability from preparedness when the changes are gradual and pervasive. Where does the independent verification of question difficulty occur that allows test makers to be certain that they aren't gradually increasing difficulty (inadvertently) as test takers get better prepared?

It's a really good question and I'm sure there's some level of imprecision involved when you're using test-takers as the baseline and the baseline itself is getting better at the test.

But keep in mind that when we say people are getting smarter or better prepared, that just means a greater percentage of people are scoring higher. There are still people scoring at every level from 120 to 147 to 169 to 177. That's what I meant when I said smarter test-takers simply means more high scorers to use as data points. In theory, it doesn't raise the standard.

Going back to the example of the 170--a 170 used to be the 98th percentile but now it's the 97th percentile. I attribute this to the fact that people are getting smarter or better prepared. But there are still plenty of people scoring in the 160s to use as the baseline for future 160s scorers. LSAC did not make the test more difficult in response to people getting smarter. If they did, perhaps they would have adjusted the conversion scale to ensure that 170 remains the 98th percentile, which didn't happen.

Roughly speaking, it's not that 98th percentile scorers in the past set the standard for what qualifies as 98th percentile scores in the future. That would effectively be a curve. It's that 170 scorers in the past set the standard for what qualifies as 170 in the future.

JamMasterJ wrote:Whereas less prepared people used to score 170s, the test had to be made harder (not significantly, but somewhat noticeably) so that there wouldn't be an uneven distribution of scores along the percentiles.

This isn't right. LSAC wouldn't make a 170 harder to get to adjust for people getting smarter. If they did, they would be giving people an incentive to take the LSAT earlier. More importantly, a 170 in 2000 wouldn't be comparable to a 170 in 2010, which is against the whole point of equating.

Don't think of it as a curve. Think of it as making every test equivalent to each other so that schools can just look at the scaled score and not have to worry about comparing different exams.

Even if June takers are smarter than February takers, it's not harder to score high in June than in February.

Similarly, even if 2012 takers are smarter than 2002 takers, it's not harder to score high in 2012 than in 2002.

I meant that a June 2010 test is not like a June 1995 test. I can't back it with any evidence, but I've heard that there has been a significant increase in studying for the LSAT since that time. I'm not positive, but I don't think there's been an increase in scores

The test might evolve, but equating still ensures that a 170 in 1995 is, on average, just as good at the things the LSAT is meant to test as a 170 in 2010. A 170 might get diluted because more people earn it, but it still indicates the same level of aptitude.

In terms of people preparing a lot more, yes, it's happening, and that might be why a 170 is no longer the 98th percentile. But don't overestimate this effect. Not everyone studies; not everyone who studies studies effectively. In the broad scheme of things, the distribution of thousands and thousands of test-takers is quite stable.

soj wrote:The test might evolve, but equating still ensures that a 170 in 1995 is, on average, just as good at the things the LSAT is meant to test as a 170 in 2010. A 170 might get diluted because more people earn it, but it still indicates the same level of aptitude.

In terms of people preparing a lot more, yes, it's happening, and that might be why a 170 is no longer the 98th percentile. But don't overestimate this effect. Not everyone studies; not everyone who studies studies effectively. In the broad scheme of things, the distribution of thousands and thousands of test-takers is quite stable.

I didn't realize that the percentile had shifted downward. That's where my misunderstanding lied.

they look at everyone with a 170 who took 4 experimental sections. The averaged-out number of wrong questions on the 4 experimental sections among 170 scorers is -12. The curve is set at -12 to achieve a score of 170.

Say that in this first group, 2% of test-takers achieved a 170.

The next test rolls around, using those 4 experimental sections. For whatever reason, 3% of test-takers only get -12 wrong. They all still get 170's, but now 170 is only the 97th percentile.

If LSAC is wedded to perfectly equating scores, there is no way to stop this trend (assuming each successive population has slightly more people achieve higher scores--this seems to have been the case for the past decade. Personally I attribute most of that to retaking--people seem to automatically get better each time they take the test).

So....it seems the integrity of that 170 has largely been preserved through the years, even if more people are getting that score now...I think. Makes no sense for me--I was kicking ass on '90s tests and sputtering on more recent ones.

I seem to be in the minority, but I find the recent LR to be so much easier than the older tests. I feel that the recent LR stimuli are not as obscure and are much easier to read, making it easier to see the answer. LG has definitely gotten a lot easier. RC definitely seems a tad bit harder, but I feel that the changes in LR & LG make the newer tests much easier.

I would describe the newer LG and LR as more straightforward. I haven't seen a curveball LG, and LR questions feel much more focused on assumption questions and inference questions. RC is harder for me though because it seems to have more inference questions, as well as more detail-oriented and technical in terms of language. It's just a pain in the ass.

they look at everyone with a 170 who took 4 experimental sections. The averaged-out number of wrong questions on the 4 experimental sections among 170 scorers is -12. The curve is set at -12 to achieve a score of 170.

Say that in this first group, 2% of test-takers achieved a 170.

The next test rolls around, using those 4 experimental sections. For whatever reason, 3% of test-takers only get -12 wrong. They all still get 170's, but now 170 is only the 97th percentile.

If LSAC is wedded to perfectly equating scores, there is no way to stop this trend (assuming each successive population has slightly more people achieve higher scores--this seems to have been the case for the past decade. Personally I attribute most of that to retaking--people seem to automatically get better each time they take the test).

So....it seems the integrity of that 170 has largely been preserved through the years, even if more people are getting that score now...I think. Makes no sense for me--I was kicking ass on '90s tests and sputtering on more recent ones.

That's a really good simplified illustration of equating.

I think in general we have to be very careful about inferring trends in difficulty. Most of us can speak only for ourselves, and what we found difficult is not necessarily what test takers in general found difficult.

I'm going to chime in on this one. I understand the concept of equating, but also understand what the OP was asking. He didn't give a score, or a percentile. The question was about individual question difficulty, not what the questions equate to. This takes that all off the table. Test takers absolutely have started prepping better since 1990, so in order for equating to work, yes, the questions may have gotten harder, or at least more specialized.

In addition, the effectiveness of equating is greatly overestimated once we start moving further in to the past. The goal of equating the LSAT is likely to keep scores consistent within a 1-5 year window, not to say that you should get the same score on PT1 as you would on PT65.

Many people (myself included) consider newer tests to be easier, because we base our studying primarily off of them. To me, old LR is a mess of unrefined logic. The new LR might be objectively harder to get used to, but it's also FAR more predictable, to the point one can usually predict 70% of answers before looking at the choices. The logic used to get there is much more refined and pattern based. LG have gotten more uniform too, falling into more concrete categories. RC has gotten harder, this in my opinion is to counteract the effect of increased test prep. If the lsac realized that many students had taken courses that effectively taught LR and LG, students unable to afford classes would have a distinct advantage. To account for this they adjusted the RC sections difficulty knowing that reading skills aren't as teachable as the other sections. Of course, this is all speculation.

I do think your individual strengths plays into how well you may do on certain tests.

I remember taking a PT (June 2007 I think), and thinking, 'wow, that wasn't so bad, I think I did pretty good!' only to find out that the curve for 170 was -8. I got well below my PT average.

Later I took another PT (December 2011--had this ridiculous RC passage about a Japanese sculptor, and 2 very hard games--colored windows and building floors), and I thought I basically crapped myself. Then I saw the 170 curve was -15, and I'd actually gotten well above my average.

I do think it's pretty remarkable how much the different tests can vary. One benefit of taking every PT you can find is you learn how to handle yourself in different situations--I ended up doing okay on a LG section where I couldn't even understand one of the games and only 1/2 understood another because I'd learned sometimes it pays to just guess, then abandon ship and move on when you have time left.

ahnhub wrote:I do think your individual strengths plays into how well you may do on certain tests.

Later I took another PT (December 2011--had this ridiculous RC passage about a Japanese sculptor, and 2 very hard games--colored windows and building floors), and I thought I basically crapped myself. Then I saw the 170 curve was -15, and I'd actually gotten well above my average.

I actually took the LSAT last December, and youre thinking of another test. Also, I dont think theres been a -15 ever. But your point still stands.

ahnhub wrote:I do think your individual strengths plays into how well you may do on certain tests.

Later I took another PT (December 2011--had this ridiculous RC passage about a Japanese sculptor, and 2 very hard games--colored windows and building floors), and I thought I basically crapped myself. Then I saw the 170 curve was -15, and I'd actually gotten well above my average.

I actually took the LSAT last December, and youre thinking of another test. Also, I dont think theres been a -15 ever. But your point still stands.

Oh yeah, it was December 2010. The curve was -15 though. It felt like it should have been -30.