Automatically grading programming homework: Echoes of Proust

I’d love to see this new system from MIT compared to Lewis Johnson’s Proust. Proust also found semantic bugs in students’ code. Lewis (and Elliot Soloway and Jim Spohrer) collected hundreds of bugs when students were working on the Rainfall Problem, then looked for those bugs in students’ programs. Proust caught about 85% of students’ semantic errors. That last 15% covered so many different bugs that it wasn’t worthwhile to encode the semantic check rules — each rule would only fire once, ever. My guess is that Proust, which knew what problem that the students were working on, would do better than the MIT homework checker, because it can only encode general mistakes.

The new system does depend on a catalogue of the types of errors that student programmers tend to make. One such error is to begin counting from zero on one pass through a series of data items and from one in another; another is to forget to add the condition of equality to a comparison — as in, “If a is greater than or equal to b, do x.”

The first step for the researchers’ automated-grading algorithm is to identify all the spots in a student’s program where any of the common errors might have occurred. At each of those spots, the possible error establishes a range of variations in the program’s output: one output if counting begins at zero, for instance, another if it begins at one. Every possible combination of variations represents a different candidate for the corrected version of the student’s program.

I prefer the other extreme — in- person grading, in which students explain their code, and we find and allow them to fix the errors. It is also a good opportunity to talk about performance and coding style. Still, using Proust as a first pass to find the errors could be helpful.

An oral presentation of an individual’s work is great, but not always possible. People need to communicate in a formal language that is most applicable to the solution. This is true in math, sciences, engineering, business, art, and computer science. Writing clearly and concisely is an important skill for presenting and maintaining your knowledge so everyone can benefit from your work; not just those who have the benefit of listening to you.

I would also like to see anything that MIT releases about their automatic grading system especially if it described their performance and technology used. Proust automatic grading caught 85% of the semantic errors since the domain (the specific program) was specified. Ina short 6-month consultant position at ETS we replicated this with the automated grading of the APCS free response questions. This was when the APCS test was in Pascal. The program prompt was known so an expert System (ES) was trained for that problem which would query an Abstract Syntax Tree (AST) dynamically constructed from the students submitted program. We graded on good and bad. The grading was reliable, consistent, and outperformed humans. It never saw the light of day.