Summary: when designing an exam, can we prove that choosing moderately difficult questions with a moderately high passing score (50%) is better to measure students' ability than a test with easy questions and a high passing score (95%) or a test with hard questions and a low passing score (5%), despite all 3 tests being of the same difficulty, because chance will have a smaller influence in the test with the 50% passing score than in the other two?

For the purpose of this question, let's consider a foreign language certification test, where students answer a set of multiple choice questions (4 choices per question, with exactly 1 correct answer). The student's score is the percentage of questions they answered correctly. The student either passes the test (if their score is above the passing grade) or fails (if it is below). The score is only used to determine if the student passed or failed, but is not relevant beyond that. (In this respect, the test is different from TOEFL type tests, where the actual score itself is used to measure the student's ability, and there is no specific pass/fail grade).

The passing difficulty (ability required of the test taker to pass the test) is a given: it must be consistent with the passing difficulty of the test in previous years for the certification to have any meaning.

However the passing score can be varied by the test organizers, if at the same time we vary the difficulty of the questions: if we increase the difficulty of the questions, we lower the passing score, and vice versa, in order to maintain the same passing difficulty (i.e. the pass/fail result for a given student of a given ability will not change on average if we adjust the passing grade and question difficulty together).

Now the question is: given all of the above, what passing score should the test organizers choose in order to minimize the influence of chance on the test result?

Intuitively, if they set a very high (e.g. 95%, with very easy questions) or very low (e.g. 5%, with very hard questions), it seems that chance will have a much higher influence than if they choose 50% (with moderately difficult questions).

Indeed, by answering randomly a student with no ability at all would get a 25% score on average (due to the test being multiple choice), so most students would pass the test with very hard questions and 5% passing score even without understanding anything. On the other hand, if the passing score is 95% with easy questions, there are bound to be some questions (say 10% of questions) for which a given test taker would not know the answer and would choose the answer at random, which again would mean some amount of luck would be required to reach the 95% passing grade.

Intuition points to a passing grade somewhere in the middle, but is it possible to determine this mathematically? does it depend on the probability of getting a question right by answering randomly? (25% chance in this example)