Abstract:

The main objective of this thesis is to improve the automated assessment of programming assignments from the perspective of assessment tool developers.

We have developed visual feedback on functionality of students' programs and explored methods to control the level of detail in visual feedback. We have found that visual feedback does not require major changes to existing assessment platforms. Most modern platforms are web based, creating an opportunity to describe visualizations in JavaScript and HTML embedded into textual feedback. Our preliminary results on the effectiveness of automatic visual feedback indicate that students perform equally well with visual and textual feedback. However, visual feedback based on automatically extracted object graphs can take less time to prepare than textual feedback of good quality.

We have also developed programming assignments that are easier to port from one server environment to another by performing assessment on the client-side. This not only makes it easier to use the same assignments in different server environments but also removes the need for sandboxing the execution of students' programs. The approach will likely become more important in the future together with interactive study materials becoming more popular. Client-side assessment is more suitable for self-studying material than for grading because assessment results sent by a client are often too easy to falsify.

Testing is an important part of programming and automated assessment should also cover students' self-written tests. We have analyzed how students behave when they are rewarded for structural test coverage (e.g. line coverage) and found that this can lead students to write tests with good coverage but with poor ability to detect faulty programs. Mutation analysis, where a large number of (faulty) programs are automatically derived from the program under test, turns out to be an effective way to detect tests otherwise fooling our assessment systems. Applying mutation analysis directly for grading is problematic because some of the derived programs are equivalent with the original and some assignments or solution strategies generate more equivalent mutants than others.