The Pathway Active Learning Environment synthetic tutoring system is capable of collecting large numbers of students' short responses to open-ended questions. The analysis of these responses may provide insight into the utility of the system, as well as information about student understanding of physics. The free-response nature of our data lends itself to qualitative analysis, however large data sets benefit from automated analysis. Natural language processing and data mining approaches, such as clustering, have been of interest across a variety of fields for automating the analysis of qualitative data. However, content-specific vocabulary, an abundance of search features, some of which are irrelevant, and inherent limitations on computers' abilities to match meaning are challenges that must be overcome. In this paper we discuss an analysis protocol for training computer models for automated data analysis. The preliminary analysis of two sample questions is presented, demonstrating a baseline of success.