As part of its nonprofit mission, ETS conducts and disseminates the results of research to advancequality and equity in education and assessment for the benefit of ETS’s constituents and the field.To obtain a PDF or a print copy of a report, please visit:http://www.ets.org/research/contact.htmliAbstractThis paper presents a socio-cognitive framework for connecting writing pedagogy and writingassessment with modern social and cognitive theories of writing. It focuses on providing ageneral framework that highlights the connections between writing competency and otherliteracy skills; identifies key connections between literacy instruction, writing assessment, andactivity and genre theories; and presents a specific proposal about how writing assessment can beorganized to promote best practices in writing instruction.Key words: writing, assessment, CBAL, cognitive, competency model, evidence-centereddesign, learning progressions, reading, literacyiiAcknowledgmentsThe project reported in this paper reflects the work of many people at ETS. The larger project ofwhich this is a part was initiated under Randy Bennett’s leadership and reflects his vision for anintegrated assessment system. Nora Odendahl played a major role in the originalconceptualization and development, and key features of the design reflect her insights. MaryFowles has been an equal partner in the work at every stage, and the assessment designs reportedby her reflect her leadership and the work of many test developers at ETS, including DouglasBaldwin, Peter Cooper, Betsy Keller, and Hilary Persky. Other contributors to the work includeRussell Almond, Marjorie Biddle, Michael Ecker, Catherine Grimes, Irene Kostin, ReneLawless, Tenaha O’Reilly, Thomas Quinlan, Margaret Redman, John Sabatini, Margaret Vezzu,Chris Volpe, and Michael Wagner.iiiTable of ContentsPage1. Writing as a Complex Cognitive Skill ........................................................................................ 6

1.3. The Role of Reflective Strategies and Genres: Modeling Activity Systems in Instructionand Skill Development .......................................................................................................... 18

1More than anything else, this paper is about connections: Connections between writing and reading Connections between writing and critical thinking Connections between writing and its social context Connections between how writing is tested and how writing is taughtThe context is an ongoing effort at ETS to develop a new approach to K–12 writingassessment in which these connections are not only respected but also deeply embedded into thevery design of the assessment. Writing is not an isolated skill. It builds upon a broad foundationof prerequisite literacy skills, both supports and requires the development of critical thinkingskills, and requires the writer to solve a complicated array of rhetorical, conceptual, andlinguistic problems.None of these themes are new in and of themselves. To point out a few of the moresalient discussions, Shanahan (2006) examined complex interconnections and interdependenciesamong reading, writing, and oral language. Applebee (1984) reviewed older literaturesconnecting writing to the development of critical thought, while Hillocks (1987; 1995; 2002;2003b) emphasized the importance of inquiry in writing, noting that students need above all tolearn strategies that will enable them to think about the subject matter of their writing (Hillocks,2003a). And the literature on the social aspects of writing is even more extensive, so that thecomments that follow can do little more than indicate major themes.In recent years a number of themes have been emphasized. Literacy is a complex, varied,highly nuanced class of social practices in which school literacy has a privileged but specializedposition in our society. Students who may do poorly on literacy tasks in a school setting may yetdisplay considerable sophistication on related skills embedded in well-defined, sociallysignificant practices (Hull & Schultz, 2001). Reading and writing are not monolithic entities butcomplex skill sets deployed in historically contingent contexts; that is, the choices of forms andgenres available to the author, and the modes of communication and interaction with which theyare associated, have evolved and are evolving under the influence of social and technologicalfactors (Bazerman & Rogers, 2008; Bolter, 2001; Foster & Purves, 2001; Heath, 1991; Holland,2008; Murray, 2009; Street, 2003; Venezky, 1991). Education in reading and writing should beviewed not simply as the inculcation of a skill set, but as socialization into literate communities,2and therefore as learning how to participate in a specific set of concrete and socially valuedpractices (Barab & Duffy, 1998; Barton & Hamilton, 1998; Barton, Hamilton, & Ivanic, 2000;Carter, 2007; Englert, Mariage, & Dunsmore, 2006; Lave & Wengler, 1991; Marsh & Millard,2000; Reder, 1994; Resnick, 1991). There is broad consensus that writing skill is mosteffectively acquired in a context that makes writing meaningful, both in relation to its contentand to the social context within which writing takes place (Alverman, 2002; Graham & Perin,2007; Langer, 2001).Criticisms of particular methods of writing assessment often revolve around the contrastbetween the testing situation and the situation in which writers ordinarily write. For instance, in atimed impromptu essay examination, the writer may have no control over the topic, and often little knowledge or interest in it; no access to any source of information about the topic; little time to think deeply about the topic; and considerable incentive to focus on surface form (since the scoring rubric may penalizegrammatical mistakes or favor those students who produce the standard five-paragraph essay).And yet this list of flaws (from the writer’s point of view) can readily be transformed intoa list of virtues (from a test administrator’s point of view), such as fairness, uniformity of testingconditions, objectivity and consistency of scoring, and efficiency. In short, progress in writingassessment requires us to reconcile the twin virtues of validity and cost, which are often intension, and which may lead to fundamentally different solutions, with fundamentally differentimplications for instruction.Assessment constitutes a social context in its own right. It holds a central place in oureducational institutions and has a powerful impact upon instruction, not always for the better.What teachers teach is strongly influenced by what is on the test and even by seemingly minordetails of test format. Frederiksen (1984) discussed a variety of ways in which the format of atest and the implicit link between instruction and assessment can have unintended consequences.As Frederiksen put it:The “real test bias” in my title has to do with the influence of tests on teaching andlearning. Efficient tests tend to drive out less efficient tests, leaving many important3abilities untested—and untaught. An important task for educators and psychologists is todevelop instruments that will better reflect the whole domain of educational goals and tofind ways to use them in improving the educational process. (p. 201)Responses to this issue have gradually led toward broader use of performance-basedassessments in writing. As Yancey (1999) noted, the general trend from the 1950s to the 1970swas to assess writing indirectly with multiple-choice tests, with direct writing assessment andthen portfolio-based assessment gradually entering the picture (Elliott, 2005; White, 2004). Alandmark of direct writing assessment, Ed White’s Teaching and Assessing Writing (1985)established holistic direct writing assessment as the norm; and White (2005) demonstrates acontinuing focus on developing effective methods of writing assessment—in this case, methodsof portfolio assessment that connect portfolio contents to curricular goals via student reflectivewriting. Yet considerable room exists for improvement, particularly if connections are taken intoaccount—connections that make it almost impossible to assess writing meaningfully if it isviewed merely as an isolated skill.In 1984, Norman Frederiksen made the following observation:Over the past 25 years or so, cognitive psychologists have been investigating the mentalprocesses that are involved in such tasks as reading, writing, solving puzzles, playingchess, and solving mathematical problems. The result is a theory of informationprocessing that has important implications for teaching… Some of the cognitiveprocesses that have been identified have to do with the development of internalrepresentations of problems, the organization of information in long-term memory forefficient retrieval, the acquisition of pattern cognition and automatic-processing skills,use of strategic and heuristic procedures in problem solving, and how to compensate forthe limited capacity of working memory. Such skills are not explicitly taught in schoolstoday, but we are at a point where cognitive psychology can make substantialcontributions to the improvement of instruction in such areas. (1984, p. 200)Frederiksen postulated that this class of skills can most readily be tested with situationaltests (that is, with tests that simulate the typical conditions under which such skills are used) andsuggested the following:4Perhaps an adventuresome consortium of schools, cognitive scientists, and testingagencies could carry out demonstration projects to test the feasibility of systematicallyusing tests to influence the behaviors of teachers and learners and to provide the largeamount of practice needed to make the skills automatic. (p. 200)The past 25 years have seen further progress in modeling the cognitive foundations ofreading, writing, and other intellectual skills, and even greater progress in building socially aswell as cognitively sophisticated models of instruction. But thus far, nothing like Frederiksen’svision has been realized, not least because it requires synthesis and coordination across severaldisciplines, and the solution of a wide range of practical and technical problems.The nature of the problem can be measured in part by the kinds of difficultiesencountered by the performance assessment and authentic assessment movements (Haertel,1999; Hamilton, 2005): It can be very difficult to make an assessment more closely resemblereal-life performance, or bring it more closely into alignment with best practices in instructionand curriculum, while meeting all of the other constraints intrinsic to summative assessmentsituations, including the powerful constraints of cost and the way that testing is budgeted inparticular institutional settings. Instruction and curriculum are variable, as is practicalperformance outside a school setting, and both are dependent on context in ways that can makeperformances difficult to assess and compare. It is not easy to devise an assessment system thatdelivers good measurement, models the kinds of tasks teachers should prepare students toperform, and supports instruction. However, Bennett and Gitomer (2009) sketched out onepossible strategy for dealing with these issues involving coordinated development of summativeassessments, classroom assessments, and professional support materials. Bennett and Gitomer setas their goal an integrated assessment that did more than fulfill a simple accountability function.They advocated a form of assessment intended simultaneously to document student achievement(assessment of learning), support instructional planning (assessment for learning), and engagestudents and teachers in worthwhile educational experiences during the testing experience(assessment as learning). They argued that these goals could be achieved by leveraging advancesin cognitive science, psychometrics, and technology to build much richer assessmentexperiences.

In 2009, the National Academy of Education issued a white paper on standards,assessments and accountability that endorsed a similar set of goals. The academy recommended5a series of summative assessment reforms in which modified test designs are based upon a strongcognitive foundation and coordinated systematically with support systems for classroom teachers(including professional development and support systems, parallel formative assessments, andother supports for classroom instruction).The research reported in this paper applied Bennett and Gitomer’s (2009) ideas to writingassessment in primary and secondary grades. It focused on three aspects of the overall vision: Understanding the cognitive basis for effective writing instruction Designing formative and summative writing assessment designs that meet Bennettand Gitomer’s goal for assessment designs that use richer, more meaningful tasks,provide effective support for instruction, and constitute valuable learning experiencesin their own right Conceptualizing an approach to essay scoring that maintains a strong rhetorical focuswhile using automated methods to assess key component skills.These three topics will define the three main sections of this paper. Section 1 willdocument a cognitive framework for writing assessment. Section 2 will describe pilot assessmentdesigns that instantiate this framework. Section 3 will sketch an innovative approach to essayscoring intended to make effective use of automated essay scoring techniques withoutsubstituting automated scores for human judgment about content and critical thinking.A key conceptual element of the analysis to be presented derives from activity theory(Engestrom, Miettinen, & Punamaki, 1999), which treats interactions among people in a socialenvironment as the fundamental unit of analysis. Particular institutions, the tools skills thatenable people to participate in those systems, and the social conventions that govern interactionare all part of activity systems in which people act to accomplish goals that emerge from and arepartially defined by the roles and situations in which they are participating. Activity theory leadsdirectly to a constructivist view of learning, in which learning a skill emerges naturally fromparticipating in the activities for which the skill is intended (Hung & Chen, 2002; Jonassen &Rohrer-Murphy, 1999). The fundamental goal of the research outlined in this paper is to helpredefine writing assessment so that it more directly supports learning and helps to engage novicewriters in appropriate communities and practices. The availability of online, computerizedassessment and instructional tools presents an important opportunity to achieve this goal.61. Writing as a Complex Cognitive Skill1.1. Connections and Disconnections Among Writing, Reading, and Critical ThinkingClassical cognitive models of writing may disagree in points of detail but they agree inseveral common themes. One theme is that expert writing clearly involves at least the followingelements: A set of expressive skills that enable fluent text production. In Hayes and Flower(1980) this was identified as the translating process. In Hayes (1996) it was textproduction. In Bereiter and Scardamalia (1987) it was the knowledge-telling process. A set of receptive skills that support self-monitoring and revision. In Hayes andFlower (1980) this was called the reading process. In Hayes (1996) it was textinterpretation. In Bereiter and Scardamalia (1987) it was largely kept in thebackground except in Chapter 9, which argued for significant parallels betweenreading and writing processes, and Chapter 11, which presupposed self-reading aspart of the feedback loop necessary to revision. A set of reflective skills that support strategic planning and evaluation. In Hayes andFlower (1980) reflective skills were distributed among the planning, monitoring, andediting processes. In Hayes (1996) these elements were unified into a single categorylabeled reflection. In Bereiter and Scardamalia (1987) the knowledge-transformingmodel was intended to capture strategic, reflective thought. It differed from the Hayesand Flower model by postulating distinct rhetorical and conceptual problem spacesand subjecting both to problem analysis and goal-setting processes.Normally, given the nature of literacy as an integrated process of communication, onewould expect to find parallel expressive, receptive, and reflective skills across tasks with similardomains in play. These are different modes of thought, but they invoke the same mentalrepresentations. A reader may start with letters on the page and end up with ideas. A writer maystart with ideas and end up with letters on the page. A thinker may deal simultaneously withletters and words, sentences, paragraphs, documents, ideas, and rhetorical goals.Classical models of writing also distinguish several forms of representation that playcritical roles in the cognition of composition:7 Social and rhetorical elements are among the most complex aspects of writing skill,requiring the writer to be consciously aware of and able explicitly to model personalinteractions (specifically rhetorical transactions between author and audience) and torespond strategically to social and institutional expectations. While this aspect ofwriting is somewhat backgrounded in Hayes and Flower (1980), it is foregrounded inBereiter and Scardamalia (1987) in the form of the rhetorical problem space and amajor theme in sociocultural accounts of the writing process, as discussed above. Conceptual elements (representations of knowledge and reasoning) are also critical inthe classical cognitive models of writing. Bereiter and Scardamalia (1987)represented this aspect of writing skill as the conceptual problem space. Bydefinition, the process of planning and evaluating writing must address its content,and as Hillocks (1987) and Graham and Perrin (2007) indicated, few things are morenecessary to the writer than to have effective strategies for dealing with the subjectmatter that they wish to address. Textual elements (representations of document structure) also play a key role in allmodels of writing. From Hayes and Flower (1980) onward, document planning islargely a matter of deciding how to produce a coherent, well-structured text. Verbal elements (linguistic representations of sentences and the propositions theyencode) are the essential targets of text production in every model of writing. Whilecontrol of verbal elements is as much a part of oral language as writing, writingdepends first and foremost upon fluency of verbal production (McCutchen, 2000). Lexical/orthographic elements (representations of how verbal units are instantiated inspecific media such as written text) obviously also play a role in writing, though theyare not in focus in the major cognitive accounts discussed above. See Berninger(2005).Therefore, it is appropriate to conceptualize skills relevant to writing by modes of thought(receptive, expressive, or reflective) and by types of cognitive representation (social, conceptual,textual, verbal, or orthographic). Figure 1 presents a visualization of writing skills that embodiesthis understanding. It is possible to interpret Figure 1 as a list of competencies or skills, viewed8in an entirely cognitive mode, but a richer interpretation is also available. Figure 1 can be viewedas a kind of cross-section of cognitive processes likely to be taking place in close coordinationduring any act of writing. It can also be viewed as an inventory of the types of activities in whichliterate individuals commonly engage, and thus viewed as part of the definition of activitysystems relevant to writing. The advantages to viewing Figure 1 in these ways will be explainedlater.Note that Figure 1 presents these skills by providing a single action verb such as inquire,structure, or phrase, which is intended to name the intended activity (and to indicate indirectlywhat skills are therefore critical). Each layer of the model—social, conceptual, textual, verbal,and lexical/orthographic—covers a range of phenomena including those elements listed in Table1, which helps to clarify the kinds of tasks and thought processes to which each mode ofrepresentation applies.

Figure 1. Modes of thought and modes of representation in the literacy processes.9Table 1Activity/Skill Categories Relevant to the Writing ProcessLevel ofrepresentationRange of activities and skillsSocialIntentionality (genre, role, purpose)Perspective (point of view, bias, voice)Affect (stance, evaluation, tone)ConceptualExploration (review, reflection, description)Explication (generalization, definition, analysis)Modeling (synthesis, application, hypothesis-formation, experimentation)Judgment (evaluation, justification, criticism)TextualDocument structure (organization, rearrangement)Cohesion (relevance, focus/emphasis, given/new, transitions, textualinference)Development (topics, elaboration)VerbalVocabulary (word familiarity, word choice, paraphrase)Sentence structure (sentence complexity, sentence variety, sentencecombining)Ambiguity/figures of speech (creative word use, semantic flexibility,clarification)Lexical/orthographicGrammar & usage (standard English)Spelling & mechanics (conventional written form)Word-formation (inflection, derivation, word families)Code-switching (register, dialect)The major headings in Table 1 can briefly be defined as follows: Social Skills Empathize—The ability to interpret documents or other forms of communicationin a rich, socially perceptive fashion that takes into account the motivations,perspectives, and attitudes of author, intended audience, and individualsreferenced in the text. This heading involves forms of inference based upon socialskills and the ability to model human interaction. Engage—The ability to communicate with an audience in a disciplined andeffective way, focusing on achieving a particular purpose, and maintaining avoice and tone appropriate to that purpose Collaborate—The ability think reflectively while working collaboratively in thefull range of social practices common to highly literate communities (such ascritical interpretation of text, presentation of research results, and reasoned10argumentation) with full sensitivity to the social, cognitive, and emotionaltransactions that such social practices may entail, including choice of register andgenre to suit the social situation, and rhetorical purpose, choice of stance, andsensitivity to multiple perspectives. Conceptual Skills Infer—The ability to subject a document or a set of documents to close reading,in which the reader goes beyond literal meaning to engage the ideas presented andintegrate them deeply with prior knowledge. This involves the kinds of inferencetypically referred to as bridging inference and more active forms of textinterpretation requiring close attention to conceptual content. Inquire—The ability to develop ideas in an organized and systematic way suchthat they can be presented clearly and convincingly to someone who does notalready understand or believe them Rethink—The ability to evaluate, critique, and modify one’s own or another’sideas using evidence and logical reasoning Textual Skills Integrate—The ability to read a document and build a mental model of its contentand structure. This definition is intended to include what current reading theoriesrefer to as the construction of the text base. What reading theories refer to as thesituation model requires mobilization of conceptual and social inferencing, whichcan go well beyond information directly available in the text. Structure—The ability to produce a written document that follows an outline orsome other well-structured textual pattern. Plan/Revise—The ability to conceive a document structure that does not exist andplan that structure to serve a rhetorical purpose, or conversely, upon determiningthe structure of an existing document, to evaluate how well it organizes andpresents its content, and rework it accordingly.11 Verbal Skills Understand—The ability to understand texts written in standard English; that is,the ability to extract literal meaning from a sequence of sentences. This element(in combination with the ability to handle complex document and textualstructures) is critical in constructing a literal understanding of a document (or text-base), though success at understanding phrases and sentences does not guaranteean adequate understanding of a complex text. Phrase—The ability to express oneself in standard English; that is, the ability tofind the right words and phrasings to convey one’s intended meaning Edit—The ability to identify places in a text where word choice and phrasing donot convey the intended meaning clearly and accurately, and then to come up withalternative phrasings that work better in context Orthographic Skills Read—The ability to take printed matter and read it either aloud or silently; thatis, the ability to convert characters on the page into mental representations ofwords and sentences Inscribe—The ability to take words and sentences and convert them into printedmatter; that is, the cognitive and motor abilities needed to produce words andsentences in written form Proofread—The ability to examine printed materials, identify nonstandardpatterns and errors, and modify them so that they conform to the norms ofstandard English grammar and orthographyCognitive models also highlight aspects of writing skill that depend upon more general featuresof cognition (Bransford, Brown, & Cocking, 1999). The role of short-term memory and long-term memory, for instance, can hardly be neglected (Kellogg, 1996, 1999, 2001). And yetaccounts of reading and writing processes emphasize trade-offs between automated and strategicprocesses (McCutchen, 1988, 1996, 2006). Skilled writers combine efficient receptive andexpressive skills with appropriate and effective reflective strategies.121.2. Connections and Parallelisms Among Writing, Reading, and Critical Thinking SkillsOne advantage of the kind of analysis presented above is that it highlights the extent towhich complex verbal skills draw upon the same underlying capacities. Figure 1 can be readsimultaneously as a specification of skills that underlie writing and as a broad inventory ofliteracy skills. One set of arrows followed out from the center, from orthographic to social,closely tracks skills that would be highlighted in a model of reading competency: the abilities todecode written text, apply basic verbal skills, build up a literal interpretation of the document,and then create a situation model reflecting a conceptual model of document content and arhetorical understanding of the writer’s purpose. Another set of arrows followed inward fromsocial to orthographic, closely tracks skills that are highlighted in writing assessment: theabilities to assess the rhetorical situation, understand the concepts to be communicated, plan adocument that will communicate particular concepts and achieve particular rhetorical purposes,convert that plan into phrases and sentences, and express them in written form. The third set ofarrows, followed either inward or outward, deals in the outer layers with skills normallyhighlighted in accounts of critical thinking and in the inner layers with revision, editing, andproofreading, textual skills closely associated with the critical evaluation of texts.It would be possible simply to equate reading with receptive skills, writing withexpressive skills, and critical thinking with reflective skills, but that would be anoversimplification. For instance, reading skill is often taken to include all the activities thatsupport effective comprehension, which may include writing notes, asking reflective questions,and participating in a range of other activities that are not reading activities in and of themselvesbut which are being used to support reading. In the same way, writing skill includes a wholerange of skills that involve reading and critical thinking, particularly during revision. And it isfairly clear that skilled critical thinkers (at least in a literate society) will deploy a variety ofreading and writing activities in support of reasoning.In other words, reading, writing, and critical thinking appear to be mutually supportingand highly entangled. Every skill noted in Figure 1 matters for writing. But the same skillsappear to matter for reading, too, with a different emphasis. The skills that are most important forreading play a supporting role in writing competency; but conversely, skills that are critical forwriting play supporting roles in enhancing reading comprehension.13Reading, writing, and critical thinking can thus reasonably be viewed as different butcomplementary activity types that share a common underlying skill set. They havecomplementary purposes (such as comprehension, explanation, and negotiation of commonground) but combine in specific ways to define the practices of a literary community. In activitytheory terms, the literacy skill set—that is, the elements listed in Figure 1—can be viewed asactivities that function as Vygotskian tools for members of a literate community of practice.Novice writers may have to learn some of the skills in the toolkit, but above all they have to learnhow to coordinate them in the ways that enable them to create effective written texts. Thedifference between reading, writing, and critical thinking is defined by the final goal of activity,but in the course of accomplishing that goal, a writer may call upon any skills drawn from any ofthe categories in Figure 1 and may combine them in strategic ways.Aligning reading and writing with critical thinking: The Paul-Elder frameworks. Theobservations made thus far suggest that it should be possible, in general, to align specific criticalthinking skills with specific reading and writing skills. This hypothesis appears to be correct. Therelationship can most readily be expounded by taking one popular model of critical thinking—thePaul-Elder model (Paul & Elder, 2005)—and showing how it lines up with the skills outlined inTable 1. While the Paul-Elder model is not the only model of critical thinking (Ennis, 1987; King &Kitchener, 1994; Kuhn, 1999), it is widely accepted and provides a useful standard of comparisonsince it was designed as an explication of critical thinking appropriate to support instruction.The Paul-Elder model distinguishes several elements of thought and provides a list ofseveral partially corresponding standards for evaluating the quality of thought. The elements ofthought comprise the following (see Elder & Paul, 2007) Purpose—“all reasoning has a purpose.” Effective critical thinking aims toaccomplish clear, meaningful, and realistic purposes. The corresponding standard isrelevance (“relating to the matter at hand”). Question at Issue—“all reasoning is an attempt to figure something out, to settlesome question, to solve some problem.” Effective critical thinking identifies thequestion at issue, clarifies its meaning, and explores its ramifications. Thecorresponding standard is also relevance.14 Point of View—“all reasoning is done from some point of view.” Effective criticalthinking is aware of its own point of view, considers alternative points of view, andavoids egocentric and bias. The corresponding standard is breadth (“encompassingmultiple viewpoints”). Assumptions—“all reasoning is based on assumptions.” Effective critical thinking isaware of its own assumptions, recognizes their consequences, and is willing toquestion them. The corresponding standard is fairness (“justifiable, not self-serving orone-sided”) Concepts—“all reasoning is expressed through, and shaped by, concepts and ideas.”Effective critical thinking defines its concepts fully. The relevant standards are clarity(“understandable, the meaning can be grasped”), precision (“exact to the necessarylevel of detail”), and depth (“containing complexities and multipleinterrelationships”) Information—“all reasoning is based on data, information, and evidence.” Effectivecritical thinking bases its conclusions on accurate information that fully justifies theconclusions drawn. The corresponding standard is accuracy: whether the informationis “free from errors or distortions; true.” Interpretation and Inference—“all reasoning contains inferences or interpretationsby which we draw conclusions and give meaning to data.” Effective critical thinkingis aware of the difference between inferences and direct evidence and is open toalternative interpretations. The relevant standard is logic (“the parts make sensetogether, no contradictions”) Implications and Consequences—“all reasoning leads somewhere or hasimplications and consequences.” Effective critical thinking explores and takesresponsibility for the consequences of its own conclusions. The relevant standard issignificance (“focusing on the important, not trivial”).These can be set in approximate parallel with elements in our own model, as shown in Table 2,though the two models are not identical. One difference worth noting in passing is that the Paul-Elder model does not distinguish between concepts and their expression; thus, three standards15Table 2Mapping Between Skills Mentioned in Table 1 and the Paul-Elder Critical Thinking Model

Textual Document structure [Expression ofconcepts]Depth—containingcomplexities andmultipleinterrelationshipsTextual Cohesion [Expression ofconcepts]Clarity—understandable; themeaning can begraspedTextual Development [Expression ofconcepts]Precision—exact tothe necessary level ofdetailapply to concepts, though they also map more or less transparently onto three distinct aspects ofthe textual level in our framework. However, the most important difference is that the Paul-Eldermodel does not draw a distinction between the social and conceptual elements of their model, a16difference that connects rather strongly to their emphasis on critical thought, rather than literacyconstrued more broadly.This parallel display helps clarify the idea that reading, writing, and critical thinking aredistinct activity systems founded upon common underlying skills. One can have critical thinkingwithout reading or writing (for there is no requirement that reflective thought be expressed inwritten form). Writing can take place without deep reflection, for there is no guarantee that thethoughts expressed in a written text will be significant, relevant, fair, clear, precise, complex,accurate, or logical. Yet the whole point of skilled writing is to mobilize all of the resourcesavailable to the writer to achieve meaningful goals. The expert writer knows when to applyreflective thinking to writing tasks, just as the expert thinker knows when to use writing as a toolfor reflection. The skills are not the same, but they mobilize similar underlying abilities.These points can be elaborated a bit further by considering how Table 2 brings the Paul-Elder model into alignment with Figure 1 and Table 1. The parallels are not exact, but they arehighly suggestive. Table 1 isolates three major elements that play a crucial role in socialunderstandings of communication: intentionality, perspective, and affect. Table 2 illustrates howcertain aspects of the Paul-Elder model are essentially parallel. Let us examine these aspects onepiece at a time, starting with the social model, then proceeding to the conceptual and textualmodels.Social aspects. The literacy model presented in this paper selects intentionality,perspective and affect as broad subject headings capturing some of the distinctive elements ofsocially-focused thought. The Paul-Elder model does not make the same distinction, so Table 2identifies elements of that model that correspond to ours, not identifying the two models. Itseems unexceptional to claim that the concepts of purpose, of the question at issue, and relevanceare all related to intentionality, or to claim that point of view and breadth address issues ofperspective. Table 2 sets up a parallel between the affective elements in our model and two Paul-Elder elements: assumptions and fairness. This is more questionable, since assumptions to asignificant extent are related to point of view. It seems reasonable to place it parallel to affect,since the affective element of the social model includes commitments and stances toward ideas,which is what usually biases people not to notice their own assumptions or to treat theperspectives of others dismissively.17Conceptual aspects. Table 1 outlines four general types of activities (exploration,explication, modeling, and judgment) that constitute major types of conceptual thought. Thesegeneral tyupes map onto much more specific types of activities, and families of strategies that gowith them, which are outlined in more depth in an appendix at the end of this paper (Table 5).The parallel to elements of the Paul-Elder model is not exact, but it is informative. The Paul-Elder model distinguishes concepts, interpretation/inference, inference, andimplications/consequences, and sets forth standards for intellectual quality focusing on clarity,precision, depth, logic, accuracy, and significance. Practically speaking, it is impossible toperform any sort of thinking activity without being concerned with all of these elements, but it isreasonable to postulate that exploration activities are first and foremost concerned with identifying,understanding, and/or explaining concepts clearly, precisely, and in depth; explication activities are first and foremost concerned with making explicit theinferences and interpretations necessary to understand a subject or a text, and thus inbringing out the underlying logic of the conceptual system being addressed; modeling activities are first and foremost concerned with providing an accurate modelthat captures all of the important facts about the subject being modeled; and judgment activities are first and foremost concerned with evaluating ideas in terms oftheir significance, implications, and consequences—though of course, evaluationimplies critical attention to all aspects of conceptual structure.These parallels highlight the presence of similar conceptual elements without necessarilyorganizing them in precisely the same way. In effect, the literacy model outlined in this paperidentifies a range of activities in which conceptual thinking is prominent, while the Paul-Eldermodel seeks to identify aspects of conceptual thinking that help define its structure; the twoorganizations share important elements but are not by any means identical.Textual aspects. The models in Figure 1 and Table 1 help highlight distinctions that arenot so clear in the Paul-Elder model, and thus cannot clearly be explicated in Table 2. Do theElder-Paul standards of clarity, precision, and depth represent standards for thought, or do theyrefer instead to the manner in which thoughts are verbally expressed? It is not entirely clear thatthis is a meaningful distinction, but at first blush it would seem that standards of clarity,18precision and depth apply much more directly to the textual presentation of a system of ideasthan they are to unexpressed, purely mental conceptions that have not yet been put into a formthat can be communicated to other people. It is hard to evaluate whether a system of ideas hasdepth unless the complexities and interrelationships it addresses have been laid out explicitly intextual form. An inextricable connection exists between precision of content and precision ofphrasing, or between the clarity of thought and the ability to express it coherently. Table 2expresses these parallels and connections by linking these standards both to the conceptual and tothe textual models.1.3. The Role of Reflective Strategies and Genres: Modeling Activity Systems in Instructionand Skill DevelopmentResearch on the acquisition of complex skills—including writing, reading, and criticalthinking—emphasizes the importance of strategy instruction (Block & Parris, 2008; De La Paz &Graham, 2002; Graham & Harris, 2000; Graham & Harris, 2005; Graham, Harris, & Troia, 2000;Graham, MacArthur, Graham, & Fitzgerald, 2006; Pressley, 1990; Pressley, Harris, Alexander,& Winne, 2006; Souvignier & Mokhlesgerami, 2005; van Gelder, 2005; van Gelder, Bissett, &Gumming, 2004).The typical path to mastery begins with explicit instruction in conscious strategies thatsupport the learner in the early stages of skill acquisition. Over time, the new skill becomesroutine and aspects of it are automatized, though the learner has the capacity to fall back onconscious strategies under conditions that stress or overwhelm automated capacities.Given the arguments that this review has presented thus far, it would be reasonable toexpect deep parallelisms among the kinds of strategies that support reading, writing, and criticalthinking. An examination of the literature suggests that this is indeed the case.Strategy families as modes of thought. An obvious connection is advanced betweenstrategy instruction and the classification of educational objectives in Bloom (1956) andpresented in revised form in Anderson and Krathwohl (Anderson et al., 2001). Strategies tosupport comprehension, composition, and critical thinking range from simple memory-basedmethods to complex forms of synthesis and evaluation. In terms of the high-level model in Table1, such strategies are ways to rethink what one already knows by clarifying what one does notfully understand, synthesizing and hypothesizing new ideas, and criticizing old ones. These kindsof strategies tend to fall into a relatively small range of families. Space is not available here to19elaborate on these families, though Table 5 in the appendix presents a taxonomy of conceptualstrategies that appear (often in slightly different guises) sometimes as reading strategies,sometimes as writing strategies, and sometimes as more general conceptual, critical thinking, orinquiry strategies. By way of illustration, two such families we be considered. A first example isa family of strategies that include freewriting (a writing strategy) and its close cousin, self-explanation (a reading strategy); a second example is outlining, which can be deployedstrategically either as a tool to support planning (a writing strategy) or to improve global textcomprehension (a reading strategy).Freewriting vs. self-explanation. Freewriting is a common strategy recommended whenwriters are beginning to develop their ideas. The technique requires the writer to forget aboutstrategic control and planning and just put words to the page, letting one idea lead to another,giving the writer every chance to express himself or herself without worrying (yet) how thoseideas will fit into a rhetorical plan (Elbow, 1987). After freewriting has taken place, the textproduced can be subjected to analysis, which may help the writer identify what is reallysignificant and important, and to identify what really needs to be said (Elbow, 1994; Yi, 2007).Self-explanation is a strategy recommended when readers need to deepen theirunderstanding of a text. Readers write down what they understand the text to mean, worryingonly about expressing their current understanding without worrying about how closely the self-explanation tracks all details of the text. Afterward, the reader can compare the original text tothe self-explanation and perhaps discover aspects of the text that are not yet fully understood(Chi, 2000; Chi, Bassok, Lewis, Reimann, & Glaser, 1989; McNamara & Magliano, 2009). Theparallelism between the two techniques is worth noting. Both involve the use of expressive skillsto force a clarification of ideas and involve a temporary suppression of evaluation in order tofacilitate the process. Under the proper circumstances, both techniques can enable reflection andthus support critical thinking.Outlining for comprehension vs. outlining as text planning. Outlining is the use of agraphic organizer or other explicit hierarchical structure to represent how a document isorganized. Creating an outline is often recommended as a strategy to support readingcomprehension (Jiang & Grabe, 2007). While a skilled reader may be able to organize documentcontent implicitly, without recourse to outlining, the reflective act of creating an outline forcesreaders to identify main ideas and supporting details specifically and requires them to encode20explicitly how different parts of the outline are related. A graphic organizer reduces the load onshort-term memory by offloading some of the organizational effort into a visual encoding. Ofcourse, outlining has the same advantages when recruited as a planning tool, which makes it oneof the few planning strategies known to have a powerful positive effect on writing quality(Kellogg, 1988). Both forms of outlining instantiate a general class of strategies for reflectivethought—the use of visual hierarchies to encode relevance and significance relations.Genres of writing as purpose-driven activities. The general framework proposed in thisreview treats writing as being essentially purpose-driven. It is part of an activity system and isdistinguished from other, closely related activities by its goal (producing a written text) and bythe strategies it deploys to mobilize literacy skills to achieve that goal. Once writing is conceivedof in this way, it extends logically to cover the concept of genre. A specific genre of writing isfocused on achieving a particular type of goal. For instance, an argumentative essay is focusedupon the goal of establishing the truth of a claim. Achieving this goal logically requires thewriter of an argumentative essay to accomplish certain things, such as elaborating subclaims,providing supporting evidence, rebutting counterarguments, or exploring logical consequences.Some of the tasks that need to be accomplished will be similar from one genre to another, whileothers, such as those listed above for argumentation, form a constellation of tasks strongly linkedto genre-specific goals. Genres typically adopt conventional patterns, including conventionalpatterns of organization and conventional stylistic features. If genres are viewed asconventionalized activities within a larger activity system, these conventions reflect strategies forsolving genre-specific problems whose usefulness has led to repetition and ultimately toconventionalization.There is nothing particularly surprising about any of the conclusions noted thus far—similar observations have been made by a variety of genre theorists (Bazerman, 2004; Russell,1997)—but it does lead to an important conclusion for our purposes. It means that learning towrite consists in large part of three things: Learning key strategies Learning how to assemble those strategies in meaningful ways to accomplish specificgoals as part of purposeful activities21 Turning the resulting assemblies (i.e., complex activity plans) into routine, efficientprocedures for handling ordinary problems A corollary is that writers are likely to be ill-served if they learn strategies piecemeal,without understanding how to connect them to meaningful purposes—and that theywill be equally ill-served if they are taught narrow routines for achieving specificwriting goals without ever learning how general-purpose strategies cohere withspecific writing tasks in meaningful contexts.Another way of making the same point is to consider how conceptual strategies map ontothe genre categories that students need to have mastered by the time they reach college. Variousstudies of the kinds of writing required at the collegiate level have been conducted (Biber, 1980;Bruce, 2005; Gardner & Powell, 2006; Hale et al., 1996; Martin & Rose, 2006; Nesi & Gardner,2006; Rosenfield, Courtney, & Fowles, 2004), as well as genre analyses of the types of readingand writing required in primary and secondary school (Kirsch & Jungeblut, 2002; Martin &Rose, 2006). If this information is collated to produce a reasonably complete list of genres thatsupport academic work, and to determine which strategies are most central to each, it rapidlybecomes clear that students need to master a wide range of conceptual strategies—and developcomplex procedures supporting complex activities in a variety of genres—to achieve collegiatelevels of performance. Historical analysis depends critically upon one kind of strategy—reconciling multiple sources—while literary analysis depends critically upon another—closereading. Scientific reports require a familiarity with hypothesis testing, while philosophicalresearch is more strongly associated with definitional techniques going back to the Socraticmethod.Obviously, students before college age will not need to perform at a more complex level,but sophistication in applying conceptual strategies does not emerge automatically; for instance,Kuhn’s (1991) study of the development of argumentation skills demonstrated considerablerange in skill even among adults. It thus follows that the effectiveness with which students willlearn to write in a range of genres is critically dependent on their mastery of the conceptualstrategies that will enable them to accomplish genre-specific purposes. Space does not permit adetailed explication of the range of genres that students need to acquire to perform well at acollegiate level (though see Table A2 and the associated discussion in the appendix for a22condensed presentation of associations between genres and conceptual strategies). But it is veryclear that effective writers are able to handle a broad range of genres and, thus, that they must beable to mobilize many different varieties of strategic thought.Developing skill in writing does, of course, involve developing discourse, verbal, andorthographic skills—but these considerations suggest that writing skill also depends uponstrategy instruction for one very simple reason. Strategy instruction enables writers to selectivelymobilize a wide range of social, rhetorical, and conceptual skills depending on their purpose inwriting, and these skills are as necessary to high-level writing performance as general verbalfluency or generic understanding of document structure.This view militates against any approach to writing instruction—or writing assessment—that treats writing as a skill to be taught or assessed in a vacuum, which would risk constructunder-representation. For example, teaching students how to write a persuasive essay is unlikelyto succeed unless students also develop critical reading and logical reasoning skills, and knowhow to deploy those skills in support of writing an essay. That additional development is likelyto happen only if they also internalize all the elements of a community of practice in whichargument and debate are normal activities, so that they acquire not only strategies but also asense of their relevance, and internalize appropriate practices and norms.1.4. Modeling Activity Systems: A Strategy for Assessment That Supports LearningHaving come this far in extending connections among cognition, literacy, and instruction,it is now possible to return to assessment—but with a much richer understanding of the constructto be assessed and a much clearer understanding of how assessment, as an activity, needs to bestructured to reinforce the kinds of social learning that instruction should ideally support. Asnoted in the introduction, Bennett and Gitomer (2009) argued that educators should developassessment systems that document what students have achieved, help identify how to planinstruction, and turn the testing situation into a worthwhile educational experience in and ofitself. The analysis presented in this review suggests a very specific strategy for accomplishingthese goals.Expert writers can successfully pull together very complex performances that canultimately be measured by the written product. But the final written product is in some sense thetip of the iceberg: It represents performance within a complex activity system and acquisition ofprocedures for producing texts in which many different skills have been coordinated23successfully. Less-skilled writers may lack critical skills—or they may have no idea what skillsneed to be mobilized or how they should be coordinated, and that fact means that far lessinformation is to be obtained than one might wish from an analysis of the final written product.Viewing the problem purely from an assessment point of view, therefore, it would bevery helpful to find out whether writers have the skills they need to put the pieces of an activitysystem together, which means both mastery of a variety of specific procedures (in this case,genres) and mastery of appropriate procedural knowledge that will mobilize the skills they needto apply to accomplish their goals. Lacking that, there is a risk that the final written product willmask student difficulties due to compensatory relationships among skills. To take a fairlystraightforward example, it is quite common on some writing examinations for students tomemorize a shell script—a skeletal outline that contains all the elements that signify clearorganization and effective transitions. Instead of developing an organic organization focused onthe task, the student plugs reasonably relevant content into the shell. The resulting essay mayprovide much less useful information about the students’ ability to construct arguments or toorganize information than one might wish.While this may be a relatively extreme example, the same point recurs. Given a complextask such as an argumentative essay, there are many construct-relevant skills about which thefinal essay provides less-than-direct evidence. Can students understand and summarize otherpeople’s arguments? Can they recognize useful evidence when they see it (much less use itconsistently)? Do they understand the idea that arguments have to be supported and that thesupport may not work (or can be successfully attacked)? Given a high-quality argument, theanswer to all these questions is an unqualified yes. But given an unsuccessful performance, thereason for the failure may be hard to determine.It is, of course, true that everything tends to correlate with everything else—that is whatyou get when many different activities within the same activity system draw upon the sameunderlying pool of skills—but if test developers structure a test carefully, it should be possible togenerate reasonable hypotheses about why particular students are falling short of idealperformance. For instructional purposes certainties are not required, only reasonable hypothesesthat could help teachers focus their instructional goals, addressing such questions as: Whether students have the skills they need to apply appropriate strategies24 Whether their final performance demonstrates an ability to coordinate those strategieseffectivelyFor example, an argumentative essay requires students to apply argument-buildingstrategies. Some students may understand what an argument is yet fail to apply appropriatestrategies. Others may function at a much more basic level. The difference matters a great dealfor instructional purposes.These considerations lead to the somewhat paradoxical suggestion that a writing testought to test more than writing. Given a specific writing task, a specific set of activity systemscan be identified that guide expert performance. These activity systems will include specificstrategies applied by experts, and task sequences that model the kinds of things skilled writers doas they think about, plan, write, and revise that sort of text. Given that, it should be possible toidentify reading, critical thinking, and smaller-scale writing tasks that measure the skills studentsneed and that instantiate at least some of the strategies they ought to be applying. Moreover,there will be a bonus that attaches to tests with this type of structure: The test will actually modelthe kinds of strategies students need to use and will help to communicate how the writer’s workfits within the larger activity system, which will make the test an educational experience in itsown right. Perhaps it should not be called a bonus, since it is precisely what can makeassessment fit organically into instruction rather than making is an alien mode of interactionsuperimposed upon a fundamentally different form of activity. As long as the purpose of eachtask is clear—as long as students can easily infer why each task has been included on the test andcan see how that task helps them prepare for the final, integrated writing task—the test itself canbecome a meaningful experience and can be structured to model appropriate forms of strategyinstruction.Of course, the assessment strategy sketched in this review requires that each test shouldfocus on a particular genre or category of writing. The strategies that support writing anargumentative essay will not be the same as those that support writing a research paper or aliterary analysis. Not only will the strategies differ, they will need to be coordinated differently.This revelation is consistent with the vision advanced by Bennett and Gitomer (2009): One test isnot enough, at least not if the purpose of the test is to represent something of the richness ofwriting tasks that students are expected to master. It may not be necessary to increase the numberof tests vastly or to cover as wide a range of writing situations as might be covered in a portfolio25assessment. But the proposed test design, focused as it is on specific genres of writing, implies aricher array of assessments and a strategy that combines results across assessments to get acomposite picture of writing skill.In addition, the proposed test design is effectively a kind of scaffolding where thestructure of the test partially guides students through the thinking they need to accomplish. Thiskind of design makes the most sense for the age ranges at which a writing task has beenintroduced but not yet mastered—helping to address students within the Vygotskian zone ofproximal development (Vygotsky, 1978). That is, with a population consisting primarily ofstudents who may have been introduced to the task but have not yet reduced it to a routineperformance, a scaffolded assessment structure yields more information about partial learningwhile focusing instruction on making sure that students are able to apply the right strategies tothe task. When a writing task has become routine, it is reasonable not to scaffold it, andscaffolding might interfere with the skills one wishes to measure. Thus it can be anticipated thatat one grade, a task such as summarization might be the focus of an entire test, with a full arrayof lead-in tasks modeling appropriate summarization strategies. Then at a later grade level,summarization might be treated as a basic task and function as part of a supporting strategy formore complex forms of writing.In effect, a concept of writing assessment is being proposed that involves the creation of asequential family of assessments, with earlier assessments (appropriate for earlier grade levels)focused on simpler writing tasks and with later writing tasks incorporating earlier, simpler formsof writing as part of the scaffolding leading up to a more complex integrated task. The task ofconstructing such a sequence of assessments corresponds, in effect, to building a pedagogicalsequence based upon empirical studies in which some genres are introduced before others andincorporated at higher grade levels as component activities in more complex forms of writing.The task of constructing such a sequence presupposes a detailed analysis of the activitysystems underlying literate discourse. As such, it entails an analysis of the ways in whichdifferent genres relate to one another and form meaningful patterns of activity. The currentarticle cannot undertake such ananalysis in depth, but considerable prior literature focuses on thiskind of issue and illustrates the kind of analysis from which this paper has drawn. Of particularnote is work on specific academic communities of practice such as literature, science, history,and philosophy (Geisler, 1994; Graves, 1991, 1996; Hunt, 1996; Norris & Phillips, 1994, 2002;26Norris, Phillips, & Korpan, 2003; Rouet, Favart, Britt, & Perfetti, 1997; Vipond & Hunt, 1984,1987; Vipond, Hunt, Jewitt, & Reither, 1990; Voss, Greene, Post, & Penner, 1983; Voss &Wiley, 2006; Wineburg, 1991a, 1991b, 1994, 1998; Zeits, 1994)But the family of assessments envisaged here would go beyond genre analysis, becauseeach genre would be placed in a well-designed pedagogical context. Each assessment wouldmodel the kinds of strategies critical to a particular genre, while the sequence of genres wouldcarry students systematically toward more complex, more demanding tasks that depend onevery-increasing sophistication in the use of task-appropriate strategies. Space precludes adetailed discussion of what such a sequence might look like (though see Table A3 in theappendix for an attempt to map out some rough estimates of when particular genres mightusefully be taught and assessed). But the strategy at least is clear: At each grade level, the testsshould be focused on forms of writing that depend on strategies students can reasonably betaught at that age. Since more variation is found within grades than between grades, one of thepurposes of such an assessment would be to identify students who were in need of instruction atearlier and later stages of the sequence, while scaffolding learning for those students who were inthe zone of proximal development.The sections that follow will sketch out preliminary work on creating an assessmentsystem in line with this vision. In particular, Section 3 will present a design focused on writingtasks appropriate for 8thand subsequent grades, and Section 4 will discuss some of the scoringissues that arise from these designs.2. A Pilot 8thGrade Design2.1. General ConsiderationsAt thispoint the discussion must shift from a generic consideration of writing skill andfocus instead on issues of test design. The list of skills in Table 1 can be understood asconstituting a competency model—a specification of the skills needed to achieve the highestlevels of skill as a writer—as long as it is understood that strong interconnections andinterdependencies are present among the skills so the different competencies are not viewed asindependent components but as strands within a larger, ultimately integrated set of skills. Thegeneral path of development appears to involve relatively early progress with the verbal andorthographic aspects of writing, transitioning to an emphasis on discourse and document27structure in the middle grades, with conceptual and social aspects of writing playing an evermore important role in middle and later grades (Applebee, 2000; Britton, Burgess, Martin,McLeod, & Rose, 1975; Langer, 1992; Langer & Applebee, 1986), though the picture is complexand varied when variations in social background, pedagogy, and genres of writing are taken intoaccount.The work to be reported here has focused upon 8thgrade for several reasons. Eighth gradeis one of the earliest grades at which students are expected to produce developed essays andother texts with complex internal structure. It is also the age at which persuasive writing,research, and exposition first come into focus—academic genres that require very different skillsthan the narrative-focused writing so common in the primary grades in the United States.(Duke,2000, 2004). Eighth grade is thus an appropriate grade at which to examine the usefulness ofscaffolded test structures focused on rhetorical purpose and critical thinking, while allowing theskills that underlie general writing fluency (e.g., verbal and orthographic skills) to be assessedwithout scaffolding.2.2. Current StatusAll of the 8thgrade tests were developed in collaboration with the Portland, Maine,school district, which has three middle schools reflecting a mix of urban, suburban, and ruralstudents, including English language learners, since Portland is a refugee resettlement site. Thedesigns presented below represent several years of development. Test designs were thoroughlyreviewed by Portland school district teachers and administrators and were revised and reworkedmultiple times in consultation with them.Initial pilots were administered between 2007 and 2009 in Portland with relatively smallnumbers of students participating (between 125 and 200 per administration). Between Octoberand December of 2009, the four test designs described later in this paper were administered in alarge national sample.1Twenty four schools participated, representing a mix of urban, rural, andsuburban districts from 12 states throughout the country (Alabama, Arizona Arkansas,California, Florida, Georgia, Kentucky, Louisiana, Massachusetts, Mississippi, Ohio, and Texas).A total of 2,564 eighth grade students participated. Each student was randomly assigned twodifferent writing tests in a counter-balanced design; 1,978 students completed all sections of twotests; each of the four tests was therefore completed by more than 1,000 students. Answers werecollected for all questions; background data was also collected, including No Child Left Behind28(NCLB) test scores and demographic data, and keystroke logs (records of the time course ofstudent responses to the essay tasks). ETS has recently completed scoring these tests and hasbegun in-depth analysis, which will include psychometric studies appropriate for large-scalepilots, examining item functionality, dimensionality, equating across forms, and related issues.Forthcoming studies will also examine the extent to which automated scoring and automatedcollection of timing data can be used to extract instructionally useful information.

Since these analyses are not yet complete, the focus in this paper will be on the test design itselfand will explicate the design decisions that underlie it.2.3. Test DesignThe following specification underlies the designs to be presented and helps make clearhow that design maps onto the kinds of skills specified in Figure 1 and Table 1.Individual forms. Each test form is administered on the computer, requiresapproximately 90 minutes, and has the following characteristics:1. Embodies a realistic scenario in which a series of related tasks unfold within anappropriate social context. The scenario is clearly established at the beginning of thetest form to give students a sense of what they will need to do and why. It thusconnects to the social elements in Figure 1: engage, empathize, collaborate.2. Contains a sustained writing task (30–45 minutes) that strongly exercises the abilityto use critical thinking skills for writing, plan, and structure documents, to use formalwritten English, and to follow written conventions (thus exercising the expressionelements in Figure 1: engage, inquire, structure, phrase, inscribe. This task mayrequire students to write an essay, memorandum, letter, proposal, newspaper article,or other document form that they may encounter outside of school. The specific formwill be determined by the scenario. The writing needs to be formal enough, anddirected to a mature enough audience, so as to require written rather than oralvocabulary and style. These documents will be scored for the following: Rhetorical effectiveness and conceptual quality (e.g., for success at engaging therhetorical task and inquiring into the subject addressed).29 General quality of the document produced in terms of structure, phrasing, andlanguage (e.g., for success at structuring the document, phrasing its ideas, andinscribing those ideas in standard written English).3. Contains a series of lead-in and/or follow-up tasks, each relatively short (5–20minutes), that require the student to think about the content to be addressed and toengage fruitfully with the overall critical-thinking and rhetorical requirements impliedin the scenario (and thus involving elements of Figure 1 that cannot easily beaddressed in a long, integrated writing task).These tasks should also satisfy the following general criteria:1. They introduce enough information, through reading materials or other sources, toenable students to write meaningfully about the subject (and thus may exercise thekinds of interpretation and reflection processes laid out in Figure 1).2. They require students to demonstrate critical thinking skills that are necessary toperform well in the scenario modeled by the test (inquire, infer, rethink).3. They are either short writing or selected-response tasks that most students canreasonably be expected to have mastered by the target grade, but are prerequisite tosuccessful performance on the longer writing tasks.4. Taken as a set, these tasks provide enough psychometric information to judgewhether students have control of specific prerequisite reading, writing, and criticalthinking skills needed to address the larger-scale writing task. For instance, if thefinal writing task focuses on building arguments, the lead-in tasks should do so also.Ideally they would address aspects of prewriting or revision that cannot easily bemeasured in the final written product.5. Taken as a set, these tasks scaffold, and thus help model, what it means to performwell on the overall scenario and represent important stages of the thinking-and-writing process needed for successful performance. Ideally, the scenario shouldrepresent a longer writing task that would be difficult for many students at grade levelto achieve without help but which most can achieve if guided through the process stepby step with appropriate scaffolding.306. The shorter tasks should contrast with the longer-writing task in important ways,exercising parts of the competency model not easily measured by an essay task alone,in particular: At least one task should be a critical reading task without a written response, tohelp disentangle the ability to reason critically about content from general writingand drafting skills, by measuring prerequisite interpretation skills. Where practical, at least one task should require students to demonstrate theability to assess and modify documents (revise, edit, proofread). At least one task should allow students to write in a less formal style, addressingpeers or younger students rather than elders, allowing them to demonstrate theability to switch between a more formal and a more oral style, and moregenerally, giving them an ability to adapt what they write to purpose andaudience.7. They present grade-appropriate texts for students to read and think about. Thepurpose of these texts is not to assess reading skills but to give students content toconsider (e.g., to summarize, to analyze, to synthesize, to evaluate) in preparation forwriting, thus modeling the kinds of activity systems that the genre actually belongs to.The texts may be informative, persuasive, literary, research-based, or a part of anyother genre relevant to the scenario and purpose for writing. The length of the textsmust not exceed reasonable reading-time expectations for the target grade.8. They support thinking and writing activities with resources such as guidelines,writers’ checklists, evaluation criteria, tips for getting started, or other referencematerials to help students as they progress through the composing process, thushelping to make the test experience more of an educational experience in its ownright.Each year’s sequence of periodic accountability assessments. The sequence ofassessments administered during any given year and grade level will be selected to exercise abroad variety of critical reasoning skills set within an equally broad array of rhetorical situations.31The focus and content of each such assessment (periodic accountability assessment) will bedriven by critical thinking and rhetorical requirements, and not by surface form, in particular:1. Each periodic accountability assessment will require the student to demonstratecontrol of a different type of critical thinking.2. Each assessment will require students to demonstrate the ability to write in aparticular genre or form for which that type of critical thinking is essential and hasbeen targeted for instruction at that grade level.3. The distribution of critical thinking skills across forms will reflect reasonable grade-level expectations about the type and range of critical thinking skills that students willbe expected to demonstrate.4. Each periodic accountability assessment should be self-contained. The order in whichforms are administered should not matter, in order that test sequences can be adjustedto match curricular requirements.Four periodic accountability assessments. Table 3 presents key conceptual features offour assessments developed to model key characteristics of different sorts of writing that studentsshould be learning in 8thgrade. None is a genre that 8thgrade writers can be expected to havemastered, making a scaffolded structure appropriate. Table 3 specifies the kinds of criticalthinking involved, the critical thinking strategies these specific tasks help to develop, the genreof the major writing task, and the kinds of reading materials included as part of the scaffoldingfor the longer, culminating writing task.While space does not permit explication here, formative and teacher-support materialshave also been developed, in two forms: parallel scenarios (with a richer array of tasks thancould be included in the tests) and relatively independent formative assessment tasks designed tosupport skills that students need to master before they undertake the integrated writing tasks builtinto each assessment, such as summarization and thesis sentences. ETS is continuing to workwith educators to build a model that is closely linked to grade-appropriate standards and whichprovides models of appropriate instructional practice.322.4. Walkthrough of a Sample Test DesignAt this point is will be useful to consider one test design in order to clarify the transitionfrom theory to practice. What follows is a short tour through the final test design given in Table3, which focuses on explication of a literary text. Figure 2 presents an early screen from this text,which explains the scenario.The timings shown on this screen are provisional. ETS is also experimenting with longer,untimed administrations, but these are the timings built into the current pilot, which wasadministered in Fall 2009, and whose results are currently being analyzed. As this outlineindicates, the full-scale writing task is last, with preliminary tasks supporting studentunderstanding, while simultaneously measuring how well students perform on simpler versionsof skills they must call upon to succeed at the integrated writing task.

Figure 2. Overview screen for a test focused on literary analysis.Note. CBAL = Cognitively Based Assessment of, for, and as Learning.33Table 3Design for Four 8thGrade Writing AssessmentsGenre Key strategies Skills in focusRecommendation DefiningAppeal-buildingCollaborate + infer (explication)Judge how well a persuasive letter meets a rubricInfer (explication)Judge how well proposed activities meet evaluation criteriaRethink (explication)Analyze how well alternative proposals satisfy evaluation criteriaInquire (judgment) + structureRecommend one alternative, and justify that choice in the form ofa letter or memorandumReport Guiding questionsConcept mappingReconciliationInfer (judgment)Evaluate sources of informationRethink (exploration)Formulate guiding questionsInfer (exploration) + integrateOrganize information in terms of guiding questionsInquire (exploration) + structureExplain this information using an appropriate set of major pointsor bullets in pamphlet formEssay

OutliningArgument-buildingInfer (judgment)Assess how well a student text meets standards for summarizationIntegrate + inquire (explicate)Summarize arguments on an issueInfer (judgment)Classify arguments as pro or con; assess whether evidencestrengthens or weakens an argumentCollaborate + rethink (judgment):Critique an argument containing errors in argumentationInquire (judgment) + structureJustify a position on an issue in essay formInterpretive review Simulation/roleplayingClose readingEmpathize + inferMake inferences about character intentions, perspectives &attitudes from details in the textCollaborate + rethink (modeling)Clarify inferences about the text in response to other attempts atinterpretationInfer (modeling)Clarify difficult points in a text in light of global inferences andexplanationsInquire (modeling) + structureExplain and justify an interpretation of a text in essay formThe screen shown in Figure 3 illustrates one item from the first set of tasks students areassigned, which could be viewed as a reading task but is part of the class of procedures studentsneed to have mastered in order to justify an interpretation in written form. The test contains five34items of this type. The reasoning is that if students cannot identify specific places in a text thatprovide evidence to support an interpretation, they are very unlikely to be able to produce awritten text that depends upon being able to accomplish the same task in verbal form, which addsall the complexities of text production to the basic analytical procedure. Several interpretivequestions of this form are presented so as to be able to form a rough estimate of whether studentsare capable of performing this task in isolation.

Figure 3. Interpretive questions: identifying textual support.Note. CBAL = Cognitively Based Assessment of, for, and as Learning..Figure 4 shows the next question. The question simulates a blog-based classroomdiscussion comparing two selections from the source text, using previous student comments toidentify an interpretive issue and focus those issues to encourage an appropriate studentresponse. One of the key elements being assessed here is whether students will be able to focuson the interpretive issue and on identifying support for it. Both selections are available tostudents while they write this response. The choice of task is designed to create a situation inwhich students are allowed to use a voice comparable to what they might use in a class setting35addressing peers. The task is primarily scored for content. While students are told to use standardEnglish, they will not be penalized for informal features in their response.

Figure 4. Developing an interpretation: short response.Note. CBAL = Cognitively Based Assessment of, for, and as Learning..The final set of preparatory tasks focus on a third selection from the source, one thatpresents some interpretive difficulties. In the initial screen shown in Figure 5, text is highlightedand interpretive questions are inserted in the margins. The questions partially help to scaffoldstudents’ understanding of the text (by explaining elements that might be too difficult for moststudents and by highlighting issues for them to think about). These questions are re-presentedone at a time after this introductory screen, as shown in Figure 6. They are presented in multiple-choice form, but the difference among choices has to do with the quality of the explanationprovided for an answer rather than with the answer itself. Once again, several such questions arepresented so that there will be enough information to make a rough estimate whether students areable to handle this kind of analytic task. Note that this type of question is (quite intentionally)rather more difficult than the initial task where students only had to identify textual support for apredefined interpretation.36

Figure 5. Preparatory screen for the third selection from the source.Note. CBAL = Cognitively Based Assessment of, for, and as Learning.The final task is to write an essay addressing the development of the protagonist’sfeelings over the three selections. Evaluation of the essay focuses on whether a reasonableinterpretation is presented and justified effectively using evidence from the text. The essayprompt is straightforward, as shown in Figure 7. All three selections are available to the student,and the final version of these tests includes various tools to assist the writer, such as planningtools, the use of which is not assessed.The key point to note about this design is that it varies from a standard writing test byincluding a wide range of preparatory planning tasks. These tasks could variously be interpretedas reading tasks, critical thinking tasks, or short writing tasks—but in each case, the lead-in taskshelp students prepare for the final full-scale writing task, test whether they have competenciesnecessary to successful performance of that particular task, and firmly embed the entire test intoa particular activity system and a well-defined community of practice. In effect, the structure of37

Figure 6. Questions requiring selection of plausible explanations.

the test—and the fact that there are multiple such tests, each examining a different genre ofwriting—helps to define writing as a richer construct than would otherwise be the case.3. Issues Connected With Scoring3.1. General StrategyAt this point it will be useful to take a step back from the details of the design andconsider what information educators might wish to obtain from a writing test and how the testingapproach being advocated can be used to serve educational needs. These concerns dovetail, inturn, with recent trends toward the use of automated scoring methods in writing assessment andwith concerns that have been raised about their use. It is therefore incumbent upon us to considerhow tests will be scored if they are designed along the lines presented above and to explore how38that can be done efficiently, providing full support to the rich construct they are intended to testwhile providing as much useful information to educators as possible.

Figure 7. The literary explication prompt.

The outline provided in Table 1 sets forth a comprehensive list of verbal skills that maybe called upon to a greater or lesser extent in different writing tasks. It is obvious that some ofthese skills are more centrally writing tasks than others. In particular, the points in label 1entitled engage, inquire, structure, phrase, and inscribe are central writing skills almost bydefinition, since together they comprise the ability to create an effective rhetorical plan, deal