While the viva voce (oral) examination has always been used in content-based educational assessment (Latham 1877: 132), the assessment of second language (L2) speaking in performance tests is relatively recent. The impetus for the growth in testing speaking during the 19th and 20th centuries is twofold. Firstly, in educational settings the development of rating scales was driven by the need to improve achievement in public schools, and to communicate that improvement to the outside world. Chadwick (1864, see timeline) implies that the rating scales first devised in the 1830s served two purposes: providing information to the classroom teacher on learner progress for formative use, and generating data for school accountability. From the earliest days, such data was used for parents to select schools for their children in order to "maximize the benefit of their investment" (Chadwick 1858). Secondly, in military settings it was imperative to be able to predict which soldiers were able to undertake tasks in the field without risk to themselves or other personnel (Kaulfers 1944, see timeline). Many of the key developments in speaking test design and rating scales are linked to military needs.