5Generating and EvaluatingScientific Evidence and Explanations

Major Findings in the Chapter:

Children are far more competent in their scientific reasoning thanfirst suspected and adults are less so. Furthermore, there is great variation in the sophistication of reasoning strategies across individuals ofthe same age.

In general, children are less sophisticated than adults in their scientificreasoning. However, experience plays a critical role in facilitating thedevelopment of many aspects of reasoning, often trumping age.

Scientific reasoning is intimately intertwined with conceptual knowledge of the natural phenomena under investigation. This conceptualknowledge sometimes acts as an obstacle to reasoning, but often facilitates it.

Many aspects of scientific reasoning require experience and instruction to develop. For example, distinguishing between theory and evidence and many aspects of modeling do not emerge without explicitinstruction and opportunities for practice.

In this chapter, we discuss the various lines of research related to Strand 2—generate and evaluate evidence and explanations.1 The ways in which

1

Portions of this chapter are based on the commissioned paper by Corinne Zimmerman titled, “The Development of Scientific Reasoning Skills: What Psychologists Contribute to an Understanding of Elementary Science Learning.”

scientists generate and evaluate scientific evidence and explanations have long been the focus of study in philosophy, history, anthropology, and sociology. More recently, psychologists and learning scientists have begun to study the cognitive and social processes involved in building scientific knowledge. For our discussion, we draw primarily from the past 20 years of research in developmental and cognitive psychology that investigates how children’s scientific thinking develops across the K-8 years.

We begin by developing a broad sketch of how key aspects of scientific thinking develop across the K-8 years, contrasting children’s abilities with those of adults. This contrast allows us to illustrate both how children’s knowledge and skill can develop over time and situations in which adults’ and children’s scientific thinking are similar. Where age differences exist, we comment on what underlying mechanisms might be responsible for them. In this research literature, two broad themes emerge, which we take up in detail in subsequent sections of the chapter. The first is the role of prior knowledge in scientific thinking at all ages. The second is the importance of experience and instruction.

Scientific investigation, broadly defined, includes numerous procedural and conceptual activities, such as asking questions, hypothesizing, designing experiments, making predictions, using apparatus, observing, measuring, being concerned with accuracy, precision, and error, recording and interpreting data, consulting data records, evaluating evidence, verification, reacting to contradictions or anomalous data, presenting and assessing arguments, constructing explanations (to oneself and others), constructing various representations of the data (graphs, maps, three-dimensional models), coordinating theory and evidence, performing statistical calculations, making inferences, and formulating and revising theories or models (e.g., Carey et al., 1989; Chi et al., 1994; Chinn and Malhotra, 2001; Keys, 1994; McNay and Melville, 1993; Schauble et al., 1995; Slowiaczek et al., 1992; Zachos et al., 2000). As noted in Chapter 2, over the past 20 to 30 years, the image of “doing science” emerging from across multiple lines of research has shifted from depictions of lone scientists conducting experiments in isolated laboratories to the image of science as both an individual and a deeply social enterprise that involves problem solving and the building and testing of models and theories.

Across this same period, the psychological study of science has evolved from a focus on scientific reasoning as a highly developed form of logical thinking that cuts across scientific domains to the study of scientific thinking as the interplay of general reasoning strategies, knowledge of the natural phenomena being studied, and a sense of how scientific evidence and explanations are generated. Much early research on scientific thinking and inquiry tended to focus primarily either on conceptual development or on the development of reasoning strategies and processes, often using very

simplified reasoning tasks. In contrast, many recent studies have attempted to describe a larger number of the complex processes that are deployed in the context of scientific inquiry and to describe their coordination. These studies often engage children in firsthand investigations in which they actively explore multivariable systems. In such tasks, participants initiate all phases of scientific discovery with varying amounts of guidance provided by the researcher. These studies have revealed that, in the context of inquiry, reasoning processes and conceptual knowledge are interdependent and in fact facilitate each other (Schauble, 1996; Lehrer et al. 2001).

It is important to note that, across the studies reviewed in this chapter, researchers have made different assumptions about what scientific reasoning entails and which aspects of scientific practice are most important to study. For example, some emphasize the design of well-controlled experiments, while others emphasize building and critiquing models of natural phenomena. In addition, some researchers study scientific reasoning in stripped down, laboratory-based tasks, while others examine how children approach complex inquiry tasks in the context of the classroom. As a result, the research base is difficult to integrate and does not offer a complete picture of students’ skills and knowledge related to generating and evaluating evidence and explanations. Nor does the underlying view of scientific practice guiding much of the research fully reflect the image of science and scientific understanding we developed in Chapter 2.

TRENDS ACROSS THE K-8 YEARS

Generating Evidence

The evidence-gathering phase of inquiry includes designing the investigation as well as carrying out the steps required to collect the data. Generating evidence entails asking questions, deciding what to measure, developing measures, collecting data from the measures, structuring the data, systematically documenting outcomes of the investigations, interpreting and evaluating the data, and using the empirical results to develop and refine arguments, models, and theories.

Asking Questions and Formulating Hypotheses

Asking questions and formulating hypotheses is often seen as the first step in the scientific method; however, it can better be viewed as one of several phases in an iterative cycle of investigation. In an exploratory study, for example, work might start with structured observation of the natural world, which would lead to formulation of specific questions and hypotheses. Further data might then be collected, which lead to new questions,

revised hypotheses, and yet another round of data collection. The phase of asking questions also includes formulating the goals of the activity and generating hypotheses and predictions (Kuhn, 2002).

Children differ from adults in their strategies for formulating hypotheses and in the appropriateness of the hypotheses they generate. Children often propose different hypotheses from adults (Klahr, 2000), and younger children (age 10) often conduct experiments without explicit hypotheses, unlike 12- to 14-year-olds (Penner and Klahr, 1996a). In self-directed experimental tasks, children tend to focus on plausible hypotheses and often get stuck focusing on a single hypothesis (e.g., Klahr, Fay, and Dunbar, 1993). Adults are more likely to consider multiple hypotheses (e.g., Dunbar and Klahr, 1989; Klahr, Fay, and Dunbar, 1993). For both children and adults, the ability to consider many alternative hypotheses is a factor contributing to success.

At all ages, prior knowledge of the domain under investigation plays an important role in the formulation of questions and hypotheses (Echevarria, 2003; Klahr, Fay, and Dunbar, 1993; Penner and Klahr, 1996b; Schauble, 1990, 1996; Zimmerman, Raghavan, and Sartoris, 2003). For example, both children and adults are more likely to focus initially on variables they believe to be causal (Kanari and Millar, 2004; Schauble, 1990, 1996). Hypotheses that predict expected results are proposed more frequently than hypotheses that predict unexpected results (Echevarria, 2003). The role of prior knowledge in hypothesis formulation is discussed in greater detail later in the chapter.

Designing Experiments

The design of experiments has received extensive attention in the research literature, with an emphasis on developmental changes in children’s ability to build experiments that allow them to identify causal variables. Experimentation can serve to generate observations in order to induce a hypothesis to account for the pattern of data produced (discovery context) or to test the tenability of an existing hypothesis under consideration (confirmation/ verification context) (Klahr and Dunbar, 1988). At a minimum, one must recognize that the process of experimentation involves generating observations that will serve as evidence that will be related to hypotheses.

Ideally, experimentation should produce evidence or observations that are interpretable in order to make the process of evidence evaluation uncomplicated. One aspect of experimentation skill is to isolate variables in such a way as to rule out competing hypotheses. The control of variables is a basic strategy that allows valid inferences and narrows the number of possible experiments to consider (Klahr, 2000). Confounded experiments, those in which variables have not been isolated correctly, yield indetermi-

Early approaches to examining experimentation skills involved minimizing the role of prior knowledge in order to focus on the strategies that participants used. That is, the goal was to examine the domain-general strategies that apply regardless of the content to which they are applied. For example, building on the research tradition of Piaget (e.g., Inhelder and Piaget, 1958), Siegler and Liebert (1975) examined the acquisition of experimental design skills by fifth and eighth graders. The problem involved determining how to make an electric train run. The train was connected to a set of four switches, and the children needed to determine the particular on/off configuration required. The train was in reality controlled by a secret switch, so that the discovery of the correct solution was postponed until all 16 combinations were generated. In this task, there was no principled reason why any one of the combinations would be more or less likely, and success was achieved by systematically testing all combinations of a set of four switches. Thus the task involved no domain-specific knowledge that would constrain the hypotheses about which configuration was most likely. A similarly knowledge-lean task was used by Kuhn and Phelps (1982), similar to a task originally used by Inhelder and Piaget (1958), involving identifying reaction properties of a set of colorless fluids. Success on the task was dependent on the ability to isolate and control variables in the set of all possible fluid combinations in order to determine which was causally related to the outcome. The study extended over several weeks with variations in the fluids used and the difficulty of the problem.

In both studies, the importance of practice and instructional support was apparent. Siegler and Liebert’s study included two experimental groups of children who received different kinds of instructional support. Both groups were taught about factors, levels, and tree diagrams. One group received additional, more elaborate support that included practice and help representing all possible solutions with a tree diagram. For fifth graders, the more elaborate instructional support improved their performance compared with a control group that did not receive any support. For eighth graders, both kinds of instructional support led to improved performance. In the Kuhn and Phelps task, some students improved over the course of the study, although an abrupt change from invalid to valid strategies was not common. Instead, the more typical pattern was one in which valid and invalid strategies coexisted both within and across sessions, with a pattern of gradual attainment of stable valid strategies by some students (the stabilization point varied but was typically around weeks 5-7).

Since this early work, researchers have tended to investigate children’s and adults’ performance on experimental design tasks that are more knowledge rich and less constrained. Results from these studies indicate that, in

general, adults are more proficient than children at designing informative experiments. In a study comparing adults with third and sixth graders, adults were more likely to focus on experiments that would be informative (Klahr, Fay, and Dunbar, 1993). Similarly, Schauble (1996) found that during the initial 3 weeks of exploring a domain, children and adults considered about the same number of possible experiments. However, when they began experimentation of another domain in the second 3 weeks of the study, adults considered a greater range of possible experiments. Over the full 6 weeks, children and adults conducted approximately the same number of experiments. Thus, children were more likely to conduct unintended duplicate or triplicate experiments, making their experimentation efforts less informative relative to the adults, who were selecting a broader range of experiments. Similarly, children are more likely to devote multiple experimental trials to variables that were already well understood, whereas adults move on to exploring variables they did not understand as well (Klahr, Fay, and Dunbar, 1993; Schauble, 1996). Evidence also indicates, however, that dimensions of the task often have a greater influence on performance than age (Linn, 1978, 1980; Linn, Chen, and Their, 1977; Linn and Levine, 1978).

With respect to attending to one feature at a time, children are less likely to control one variable at a time than adults. For example, Schauble (1996) found that across two task domains, children used controlled comparisons about a third of the time. In contrast, adults improved from 50 percent usage on the first task to 63 percent on the second task. Children usually begin by designing confounded experiments (often as a means to produce a desired outcome), but with repeated practice begin to use a strategy of changing one variable at time (e.g., Kuhn, Schauble, and Garcia-Mila, 1992; Kuhn et al. 1995; Schauble, 1990).

Reminiscent of the results of the earlier study by Kuhn and Phelps, both children and adults display intraindividual variability in strategy usage. That is, multiple strategy usage is not unique to childhood or periods of developmental transition (Kuhn et al., 1995). A robust finding is the coexistence of valid and invalid strategies (e.g., Kuhn, Schuable, and Garcia-Mila, 1992; Garcia-Mila and Andersen, 2005; Gleason and Schauble, 2000; Schauble, 1990; Siegler and Crowley, 1991; Siegler and Shipley, 1995). That is, participants may progress to the use of a valid strategy, but then return to an inefficient or invalid strategy. Similar use of multiple strategies has been found in research on the development of other academic skills, such as mathematics (e.g., Bisanz and LeFevre, 1990; Siegler and Crowley, 1991), reading (e.g., Perfetti, 1992), and spelling (e.g., Varnhagen, 1995). With respect to experimentation strategies, an individual may begin with an invalid strategy, but once the usefulness of changing one variable at a time is discovered, it is not immediately used exclusively. The newly discovered, effective strategy is only slowly incorporated into an individual’s set of strategies.

An individual’s perception of the goals of an investigation also has an important effect on the hypotheses they generate and their approach to experimentation. Individuals tend to differ in whether they see the overarching goal of an inquiry task as seeking to identify which factors make a difference (scientific) or seeking to produce a desired effect (engineering). It is a question for further research if these different approaches characterize an individual, or if they are invoked by task demand or implicit assumptions.

In a direct exploration of the effect of adopting scientific versus engineering goals, Schauble, Klopfer, and Raghavan (1991) provided fifth and sixth graders with an “engineering context” and a “science context.” When the children were working as scientists, their goal was to determine which factors made a difference and which ones did not. When the children were working as engineers, their goal was optimization, that is, to produce a desired effect (i.e., the fastest boat in the canal task). When working in the science context, the children worked more systematically, by establishing the effect of each variable, alone and in combination. There was an effort to make inclusion inferences (i.e., an inference that a factor is causal) and exclusion inferences (i.e., an inference that a factor is not causal). In the engineering context, children selected highly contrastive combinations and focused on factors believed to be causal while overlooking factors believed or demonstrated to be noncausal. Typically, children took a “try-and-see” approach to experimentation while acting as engineers, but they took a theory-driven approach to experimentation when acting as scientists. Schauble et al. (1991) found that children who received the engineering instructions first, followed by the scientist instructions, made the greatest improvements. Similarly, Sneider et al. (1984) found that students’ ability to plan and critique experiments improved when they first engaged in an engineering task of designing rockets.

Another pair of contrasting approaches to scientific investigation is the theorist versus the experimentalist (Klahr and Dunbar, 1998; Schauble, 1990). Similar variation in strategies for problem solving have been observed for chess, puzzles, physics problems, science reasoning, and even elementary arithmetic (Chase and Simon, 1973; Klahr and Robinson, 1981; Klayman and Ha, 1989; Kuhn et al., 1995; Larkin et al., 1980; Lovett and Anderson, 1995, 1996; Simon, 1975; Siegler, 1987; Siegler and Jenkins, 1989). Individuals who take a theory-driven approach tend to generate hypotheses and then test the predictions of the hypotheses. Experimenters tend to make data-driven discoveries, by generating data and finding the hypothesis that best summarizes or explains that data. For example, Penner and Klahr (1996a) asked 10-to 14-year-olds to conduct experiments to determine how the shape, size, material, and weight of an object influence sinking times. Students’ approaches to the task could be classified as either “prediction oriented” (i.e., a theorist: “I believe that weight makes a difference) or “hypothesis oriented” (i.e., an

experimenter: “I wonder if …”). The 10-year-olds were more likely to take a prediction (or demonstration) approach, whereas the 14-year-olds were more likely to explicitly test a hypothesis about an attribute without a strong belief or need to demonstrate that belief. Although these patterns may characterize approaches to any given task, it has yet to be determined if such styles are idiosyncratic to the individual and likely to remain stable across varying tasks, or if different styles might emerge for the same person depending on task demands or the domain under investigation.

Observing and Recording

Record keeping is an important component of scientific investigation in general, and of self-directed experimental tasks especially, because access to and consulting of cumulative records are often important in interpreting evidence. Early studies of experimentation demonstrated that children are often not aware of their own memory limitations, and this plays a role in whether they document their work during an investigation (e.g., Siegler and Liebert, 1975). Recent studies corroborate the importance of an awareness of one’s own memory limitations while engaged in scientific inquiry tasks, regardless of age. Spontaneous note-taking or other documentation of experimental designs and results may be a factor contributing to the observed developmental differences in performance on both experimental design tasks and in evaluation of evidence. Carey et al. (1989) reported that, prior to instruction, seventh graders did not spontaneously keep records when trying to determine and keep track of which substance was responsible for producing a bubbling reaction in a mixture of yeast, flour, sugar, salt, and warm water. Nevertheless, even though preschoolers are likely to produce inadequate and uninformative notations, they can distinguish between the two when asked to choose between them (Triona and Klahr, in press). Dunbar and Klahr (1988) also noted that children (grades 3-6) were unlikely to check if a current hypothesis was or was not consistent with previous experimental results. In a study by Trafton and Trickett (2001), undergraduates solving scientific reasoning problems in a computer environment were more likely to achieve correct performance when using the notebook function (78 percent) than were nonusers (49 percent), showing that this issue is not unique to childhood.

In a study of fourth graders’ and adults’ spontaneous use of notebooks during a 10-week investigation of multivariable systems, all but one of the adults took notes, whereas only half of the children took notes. Moreover, despite variability in the amount of notebook usage in both groups, on average adults made three times more notebook entries than children did. Adults’ note-taking remained stable across the 10 weeks, but children’s frequency of use decreased over time, dropping to about half of their initial

usage. Children rarely reviewed their notes, which typically consisted of conclusions, but not the variables used or the outcomes of the experimental tests (i.e., the evidence for the conclusion was not recorded) (Garcia-Mila and Andersen, 2005).

Children may differentially record the results of experiments, depending on familiarity or strength of prior theories. For example, 10- to 14-year-olds recorded more data points when experimenting with factors affecting force produced by the weight and surface area of boxes than when they were experimenting with pendulums (Kanari and Millar, 2004). Overall, it is a fairly robust finding that children are less likely than adults to record experimental designs and outcomes or to review what notes they do keep, despite task demands that clearly necessitate a reliance on external memory aids.

Given the increasing attention to the importance of metacognition for proficient performance on such tasks (e.g., Kuhn and Pearsall, 1998, 2000), it is important to determine at what point children and early adolescents recognize their own memory limitations as they navigate through a complex task. Some studies show that children’s understanding of how their own memories work continues to develop across the elementary and middle school grades (Siegler and Alibali, 2005). The implication is that there is no particular age or grade level when memory and limited understanding of one’s own memory are no longer a consideration. As such, knowledge of how one’s own memory works may represent an important moderating variable in understanding the development of scientific reasoning (Kuhn, 2001). For example, if a student is aware that it will be difficult for her to remember the results of multiple trials, she may be more likely to carefully record each outcome. However, it may also be the case that children, like adult scientists, need to be inducted into the practice of record keeping and the use of records. They are likely to need support to understand the important role of records in generating scientific evidence and supporting scientific arguments.

Evaluating Evidence

The important role of evidence evaluation in the process of scientific activity has long been recognized. Kuhn (1989), for example, has argued that the defining feature of scientific thinking is the set of skills involved in differentiating and coordinating theory and evidence. Various strands of research provide insight on how children learn to engage in this phase of scientific inquiry. There is an extensive literature on the evaluation of evidence, beginning with early research on identifying patterns of covariation and cause that used highly structured experimental tasks. More recently researchers have studied how children evaluate evidence in the context of self-directed experimental tasks. In real-world contexts (in contrast to highly controlled laboratory tasks) the process of evidence evaluation is very messy

and requires an understanding of error and variation. As was the case for hypothesis generation and the design of experiments, the role of prior knowledge and beliefs has emerged as an important influence on how individuals evaluate evidence.

Covariation Evidence

A number of early studies on the development of evidence evaluation skills used knowledge-lean tasks that asked participants to evaluate existing data. These data were typically in the form of covariation evidence—that is, the frequency with which two events do or do not occur together. Evaluation of covariation evidence is potentially important in regard to scientific thinking because covariation is one potential cue that two events are causally related. Deanna Kuhn and her colleagues carried out pioneering work on children’s and adults’ evaluation of covariation evidence, with a focus on how participants coordinate their prior beliefs about the phenomenon with the data presented to them (see Box 5-1).

Results across a series of studies revealed continuous improvement of the skills involved in differentiating and coordinating theory and evidence, as well as bracketing prior belief while evaluating evidence, from middle childhood (grades 3 and 6) to adolescence (grade 9) to adulthood (Kuhn, Amsel, and O’Loughlin, 1988). These skills, however, did not appear to develop to an optimal level even among adults. Even adults had a tendency to meld theory and evidence into a single mental representation of “the way things are.”

Participants had a variety of strategies for keeping theory and evidence in alignment with one another when they were in fact discrepant. One tendency was to ignore, distort, or selectively attend to evidence that was inconsistent with a favored theory. For example, the protocol from one ninth grader demonstrated that upon repeated instances of covariation between type of breakfast roll and catching colds, he would not acknowledge this relationship: “They just taste different … the breakfast roll to me don’t cause so much colds because they have pretty much the same thing inside” (Kuhn, Amsel, and O’Loughlin, 1998, p. 73).

Another tendency was to adjust a theory to fit the evidence, a process that was most often outside an individual’s conscious awareness and control. For example, when asked to recall their original beliefs, participants would often report a theory consistent with the evidence that was presented, and not the theory as originally stated. Take the case of one ninth grader who did not believe that type of condiment (mustard versus ketchup) was causally related to catching colds. With each presentation of an instance of covariation evidence, he acknowledged the evidence and elaborated a theory based on the amount of ingredients or vitamins and the temperature of the

Kuhn and her colleagues used simple, everyday contexts, rather than phenomena from specific scientific disciplines. In an initial theory interview, participants’ beliefs about the causal status of various variables were ascertained. For example, sixth and ninth graders were questioned about their beliefs concerning the types of foods that make a difference in whether a person caught a cold (35 foods in total). Four variables were selected on the basis of ratings from the initial theory interview: two factors that the participant believed make a difference in catching colds (e.g., type of fruit, type of cereal) and two factors the participant believed do not make a difference (e.g., type of potato, type of condiment). This procedure allowed the evidence to be manipulated so that covariation evidence could be presented that confirmed one existing causal theory and one noncausal theory. Likewise, noncovariation evidence was presented that disconfirmed one previously held causal theory and one noncausal theory. The specific manipulations were therefore tailored for each person in the study.

Participants then evaluated patterns of covariation data and answered a series of questions about what the evidence showed for each of the four variables. Responses were coded as evidence based when they referred to the patterns of covariation or instances of data presented (e.g., if shown a pattern in which type of cake covaried with getting colds, a participant who noted that the sick children ate chocolate cake and the healthy ones ate carrot cake would be coded as having made an evidence-based response). Responses were coded as theory based when they referred to the participant’s prior beliefs or theories (e.g., a response that chocolate cake has “sugar and a lot of bad stuff in it” or that “less sugar means your blood pressure doesn’t go up”).

food the condiment was served with to make sense of the data (Kuhn, Amsel, and O’Loughlin, 1988, p. 83). Kuhn argued that this tendency suggests that the student’s theory does not exist as an object of cognition. That is, a theory and the evidence for that theory are undifferentiated—they do not exist as separate cognitive entities. If they do not exist as separate entities, it is not possible to flexibly and consciously reflect on the relation of one to the other.

A number of researchers have criticized Kuhn’s findings on both methodological and theoretical grounds. Sodian, Zaitchik, and Carey (1991), for example, questioned the finding that third and sixth grade children cannot distinguish between their beliefs and the evidence, pointing to the complex-

ity of the tasks Kuhn used as problematic. They chose to employ simpler tasks that involved story problems about phenomena for which children did not hold strong beliefs. Children’s performance on these tasks demonstrated that even first and second graders could differentiate a hypothesis from the evidence. Likewise, Ruffman et al. (1993) used a simplified task and showed that 6-year-olds were able to form a causal hypothesis based on a pattern of covariation evidence. A study of children and adults (Amsel and Brock, 1996) indicated an important role of prior beliefs, especially for children. When presented with evidence that disconfirmed prior beliefs, children from both grade levels tended to make causal judgments consistent with their prior beliefs. When confronted with confirming evidence, however, both groups of children and adults made similar judgments. Looking across these studies provides insight into the conditions under which children are more or less proficient at coordinating theory and evidence. In some situations, children are better at distinguishing prior beliefs from evidence than the results of Kuhn et al. suggest.

Koslowksi (1996) criticized Kuhn et al.’s work on more theoretical grounds. She argued that reliance on knowledge-lean tasks in which participants are asked to suppress their prior knowledge may lead to an incomplete or distorted picture of the reasoning abilities of children and adults. Instead, Koslowski suggested that using prior knowledge when gathering and evaluating evidence is a valid strategy. She developed a series of experiments to support her thesis and to explore the ways in which prior knowledge might play a role in evaluating evidence. The results of these investigations are described in detail in the later section of this chapter on the role of prior knowledge.

Evidence in the Context of Investigations

Researchers have also looked at reasoning about cause in the context of full investigations of causal systems. Two main types of multivariable systems are used in these studies. In the first type of system, participants are involved in a hands-on manipulation of a physical system, such as a ramp (e.g., Chen and Klahr, 1999; Masnick and Klahr, 2003) or a canal (e.g., Gleason and Schauble, 2000; Kuhn, Schauble, and Garcia-Mila, 1992). The second type of system is a computer simulation, such as the Daytona microworld in which participants discover the factors affecting the speed of race cars (Schauble, 1990). A variety of virtual environments have been created in domains such as electric circuits (Schauble et al., 1992), genetics (Echevarria, 2003), earthquake risk, and flooding risk (e.g., Keselman, 2003).

The inferences that are made based on self-generated experimental evidence are typically classified as either causal (or inclusion), noncausal (or exclusion), indeterminate, or false inclusion. All inference types can be fur-

ther classified as valid or invalid. Invalid inclusion, by definition, is of particular interest because in self-directed experimental contexts, both children and adults often infer based on prior beliefs that a variable is causal, when in reality it is not.

Children tend to focus on making causal inferences during their initial explorations of a causal system. In a study in which children worked to discover the causal structure of a computerized microworld, fifth and sixth graders began by producing confounded experiments and relied on prior knowledge or expectations (Schauble, 1990). As a result, in their early explorations of the causal system, they were more likely to make incorrect causal inferences. In a direct comparison of adults and children (Schauble, 1996), adults also focused on making causal inferences, but they made more valid inferences because their experimentation was more often done using a control-of-variables strategy. Overall, children’s inferences were valid 44 percent of the time, compared with 72 percent for adults. The fifth and sixth graders improved over the course of six sessions, starting at 25 percent but improving to almost 60 percent valid inferences (Schauble, 1996). Adults were more likely than children to make inferences about which variables were noncausal or inferences of indeterminacy (80 and 30 percent, respectively) (Schauble, 1996).

Children’s difficulty with inferences of noncausality also emerged in a study of 10- to 14-year-olds who explored factors influencing the swing of a pendulum or the force needed to pull a box along a level surface (Kanari and Millar, 2004). Only half of the students were able draw correct conclusions about factors that did not covary with outcome. Students were likely to either selectively record data, selectively attend to data, distort or reinterpret the data, or state that noncovariation experimental trials were “inconclusive.” Such tendencies are reminiscent of other findings that some individuals selectively attend to or distort data in order to preserve a prior theory or belief (Kuhn, Amsel, and O’Loughlin, 1988; Zimmerman, Raghavan, and Sartoris, 2003).

Some researchers suggest children’s difficulty with noncausal or indeterminate inferences may be due both to experience and to the inherent complexity of the problem. In terms of experience, in the science classroom it is typical to focus on variables that “make a difference,” and therefore students struggle when testing variables that do not covary with the outcome (e.g., the weight of a pendulum does not affect the time of swing or the vertical height of a weight does not affect balance) (Kanari and Millar, 2004). Also, valid exclusion and indeterminacy inferences may be conceptually more complex, because they require one to consider a pattern of evidence produced from several experimental trials (Kuhn et al., 1995; Schauble, 1996). Looking across several trials may require one to review cumulative records of previous outcomes. As has been suggested previously, children do not

often have the memory skills to either record information, record sufficient information, or consult such information when it has been recorded.

The importance of experience is highlighted by the results of studies conducted over several weeks with fifth and sixth graders. After several weeks with a task, children started making more exclusion inferences (that factors are not causal) and indeterminacy inferences (that one cannot make a conclusive judgment about a confounded comparison) and did not focus solely on causal inferences (e.g., Keselman, 2003; Schauble, 1996). They also began to distinguish between an informative and an uninformative experiment by attending to or controlling other factors leading to an improved ability to make valid inferences. Through repeated exposure, invalid inferences, such as invalid inclusions, dropped in frequency. The tendency to begin to make inferences of indeterminacy suggests that students developed more awareness of the adequacy or inadequacy of their experimentation strategies for generating sufficient and interpretable evidence.

Children and adults also differ in generating sufficient evidence to support inferences. In contexts in which it is possible, children often terminate their search early, believing that they have determined a solution to the problem (e.g., Dunbar and Klahr, 1989). In studies over several weeks in which children must continue their investigation (e.g., Schauble et al., 1991), this is less likely because of the task requirements. Children are also more likely to refer to the most recently generated evidence. They may jump to a conclusion after a single experiment, whereas adults typically need to see the results of several experiments (e.g., Gleason and Schauble, 2000).

As was found with experimentation, children and adults display intraindividual variability in strategy usage with respect to inference types. Likewise, the existence of multiple inference strategies is not unique to childhood (Kuhn et al., 1995). In general, early in an investigation, individuals focus primarily on identifying factors that are causal and are less likely to consider definitely ruling out factors that are not causal. However, a mix of valid and invalid inference strategies co-occur during the course of exploring a causal system. As with experimentation, the addition of a valid inference strategy to an individual’s repertoire does not mean that they immediately give up the others. Early in investigations, there is a focus on causal hypotheses and inferences, whether they are warranted or not. Only with additional exposure do children start to make inferences of noncausality and indeterminacy. Knowledge change and experience—gaining a better understanding of the causal system via experimentation—was associated with the use of valid experimentation and inference strategies.

THE ROLE OF PRIOR KNOWLEDGE

In the previous section we reviewed evidence on developmental differences in using scientific strategies. Across multiple studies, prior knowledge

emerged as an important influence on several parts of the process of generating and evaluating evidence. In this section we look more closely at the specific ways that prior knowledge may shape part of the process. Prior knowledge includes conceptual knowledge, that is, knowledge of the natural world and specifically of the domain under investigation, as well as prior knowledge and beliefs about the purpose of an investigation and the goals of science more generally. This latter kind of prior knowledge is touched on here and discussed in greater detail in the next chapter.

Beliefs About Causal Mechanism and Plausibility

In response to research on evaluation of covariation evidence that used knowledge-lean tasks or even required participants to suppress prior knowledge, Koslowski (1996) argued that it is legitimate and even helpful to consider prior knowledge when gathering and evaluating evidence. The world is full of correlations, and consideration of plausibility, causal mechanism, and alternative causes can help to determine which correlations between events should be taken seriously and which should be viewed as spurious. For example, the identification of the E. coli bacterium allows a causal relationship between hamburger consumption and certain types of illness or mortality. Because of the absence of a causal mechanism, one does not consider seriously the correlation between ice cream consumption and violent crime rate as causal, but one looks for other covarying quantities (such as high temperatures) that may be causal for both behaviors and thus explain the correlation.

Koslowski (1996) presented a series of experiments that demonstrate the interdependence of theory and evidence in legitimate scientific reasoning (see Box 5-2 for an example). In most of these studies, all participants (sixth graders, ninth graders, and adults) did take mechanism into consideration when evaluating evidence in relation to a hypothesis about a causal relationship. Even sixth graders considered more than patterns of covariation when making causal judgments (Koslowksi and Okagaki, 1986; Koslowski et al., 1989). In fact, as discussed in the previous chapter, results of studies by Koslowski (1996) and others (Ahn et al., 1995) indicate that children and adults have naïve theories about the world that incorporate information about both covariation and causal mechanism.

The plausibility of a mechanism also plays a role in reasoning about cause. In some situations, scientific progress occurs by taking seemingly implausible correlations seriously (Wolpert, 1993). Similarly, Koslowski argued that if people rely on covariation and mechanism information in an interdependent and judicious manner, then they should pay attention to implausible correlations (i.e., those with no apparent mechanism) when the implausible correlation occurs repeatedly. For example, discovering the cause of Kawasaki’s syndrome depended on taking seriously the implausible cor-

In studies conducted by Koslowski and her colleagues, participants were given problem situations in which a story character is trying to determine if some target factor (e.g., a gasoline additive) is causally related to an effect (e.g., improved gas mileage). They were then shown either perfect covariation between the target factor and the effect or partial covariation (4 of 6 instances). Perfect correlation was rated as more likely to indicate causation than partial correlation. Participants were then told that a number of plausible mechanisms had been ruled out (e.g., the additive does not burn more efficiently, the additive does not burn more cleanly). When asked to rate again how likely it was that the additive is causally responsible for improved gas mileage, the ratings for both perfect and partial covariation were lower for all age groups.

Koslowski also tried to determine if participants would spontaneously generate information about causal mechanisms when it was not cued by the task. Children and adults were presented with story problems in which a character is trying to answer a question about, for example, whether parents staying in the hospital with them improves the recovery rate of their children. Participants were asked to describe whatever type of information might be useful for solving the problem. Half of the participants were told that experimental intervention (that is, parents and children could not be assigned to particular groups) was not possible, while the other half were not restricted in this manner. Almost all participants showed some concern for a causal mechanism, including expectations about how the target mechanism would operate. Although the sixth graders were less likely to generate a variety of alternative hypotheses, all age groups proposed appropriate contrastive tests.

relation between the illness and having recently cleaned carpets. Similarly, Thagard (1998a, 1998b) describes the case of researchers Warren and Marshall, who proposed that peptic ulcers could be caused by a bacterium, and their efforts to have their theory accepted by the medical community. The bacterial theory of ulcers was initially rejected as implausible, given the assumption that the stomach is too acidic to allow bacteria to survive.

Studies with both children and adults reveal links between reasoning about mechanism and the plausibility of that mechanism (Koslowski, 1996). When presented with an implausible covariation (e.g., improved gas mileage and color of car), participants rated the causal status of the implausible cause (color) before and after learning about a possible way that the cause could bring about the effect (improved gas mileage). In this example, par-

ticipants learned that the color of the car affects the driver’s alertness (which affects driving quality, which in turn affects gas mileage). At all ages, participants increased their causal ratings after learning about a possible mediating mechanism. The presence of a possible mechanism in addition to a large number of covariations (four or more) was taken to indicate the possibility of a causal relationship for both plausible and implausible covariations. When either generating or assessing mechanisms for plausible covariations, all age groups (sixth and ninth graders and adults) were comparable. When the covariation was implausible, sixth graders were more likely to generate dubious mechanisms to account for the correlation.

The role of prior knowledge, especially beliefs about causal mechanism and plausibility, is also evident in hypothesis formation and the design of investigations. Individuals’ prior beliefs influence the choice of which hypotheses to test, including which hypotheses are tested first, repeatedly, or receive the most time and attention (e.g., Echevarria, 2003; Klahr, Fay, and Dunbar, 1993; Penner and Klahr, 1996b; Schauble, 1990, 1996; Zimmerman, Raghavan, and Sartoris, 2003). For example, children’s favored theories sometimes result in the selection of invalid experimentation and evidence evaluation heuristics (e.g., Dunbar and Klahr, 1989; Schauble, 1990). Plausibility of a hypothesis may serve as a guide for which experiments to pursue. Klahr, Fay, and Dunbar (1993) provided third and sixth grade children and adults with hypotheses to test that were incorrect but either plausible or implausible. For plausible hypotheses, children and adults tended to go about demonstrating the correctness of the hypothesis rather than setting up experiments to decide between rival hypotheses. For implausible hypotheses, adults and some sixth graders proposed a plausible rival hypothesis and set up an experiment that would discriminate between the two. Third graders tended to propose a plausible hypothesis but then ignore or forget the initial implausible hypothesis, getting sidetracked in an attempt to demonstrate that the plausible hypothesis was correct.

Recognizing the interdependence of theory and data in the evaluation of evidence and explanations, Chinn and Brewer (2001) proposed that people evaluate evidence by building a mental model of the interrelationships between theories and data. These models integrate patterns of data, procedural details, and the theoretical explanation of the observed findings (which may include unobservable mechanisms, such as molecules, electrons, enzymes, or intentions and desires). The information and events can be linked by different kinds of connections, including causal, contrastive, analogical, and inductive links. The mental model may then be evaluated by considering the plausibility of these links. In addition to considering the links between, for example, data and theory, the model might also be evaluated by appealing to alternate causal mechanisms or alternate explanations. Essentially, an individual seeks to “undermine one or more of the links in the

model” (p. 337). If no reasons to be critical can be identified, the individual may accept the new evidence or theoretical interpretation.

Some studies suggest that the strength of prior beliefs, as well as the personal relevance of those beliefs, may influence the evaluation of the mental model (Chinn and Malhotra, 2002; Klaczynski, 2000; Klaczynski and Narasimham, 1998). For example, when individuals have reason to disbelieve evidence (e.g., because it is inconsistent with prior belief), they will search harder for flaws in the data (Kunda, 1990). As a result, individuals may not find the evidence compelling enough to reassess their cognitive model. In contrast, beliefs about simple empirical regularities may not be held with such conviction (e.g., the falling speed of heavy versus light objects), making it easier to change a belief in response to evidence.

Evaluating Evidence That Contradicts Prior Beliefs

Anomalous data or evidence refers to results that do not fit with one’s current beliefs. Anomalous data are considered very important by scientists because of their role in theory change, and they have been used by science educators to promote conceptual change. The idea that anomalous evidence promotes conceptual change (in the scientist or the student) rests on a number of assumptions, including that individuals have beliefs or theories about natural or social phenomena, that they are capable of noticing that some evidence is inconsistent with those theories, that such evidence calls into question those theories, and, in some cases, that a belief or theory will be altered or changed in response to the new (anomalous) evidence (Chinn and Brewer, 1998). Chinn and Brewer propose that there are eight possible responses to anomalous data. Individuals can (1) ignore the data; (2) reject the data (e.g., because of methodological error, measurement error, bias); (3) acknowledge uncertainty about the validity of the data; (4) exclude the data as being irrelevant to the current theory; (5) hold the data in abeyance (i.e., withhold a judgment about the relation of the data to the initial theory); (6) reinterpret the data as consistent with the initial theory; (7) accept the data and make peripheral change or minor modification to the theory; or (8) accept the data and change the theory. Examples of all of these responses were found in undergraduates’ responses to data that contradicted theories to explain the mass extinction of dinosaurs and theories about whether dinosaurs were warm-blooded or cold-blooded.

In a series of studies, Chinn and Malhotra (2002) examined how fourth, fifth, and sixth graders responded to experimental data that were inconsistent with their existing beliefs. Experiments from physical science domains were selected in which the outcomes produced either ambiguous or unambiguous data, and for which the findings were counterintuitive for most children. For example, most children assume that a heavy object falls faster

than a light object. When the two objects are dropped simultaneously, there is some ambiguity because it is difficult to observe both objects. An example of a topic that is counterintuitive but results in unambiguous evidence is the reaction temperature of baking soda added to vinegar. Children believe that either no change in temperature will occur, or that the fizzing causes an increase in temperature. Thermometers unambiguously show a temperature drop of about 4 degrees centigrade.

When examining the anomalous evidence produced by these experiments, children’s difficulties seemed to occur in one of four cognitive processes: observation, interpretation, generalization, or retention (Chinn and Malhotra, 2002). For example, prior belief may influence what is “observed,” especially in the case of data that are ambiguous, and children may not perceive the two objects as landing simultaneously. Inferences based on this faulty observation will then be incorrect. At the level of interpretation, even if individuals accurately observed the outcome, they might not shift their theory to align with the evidence. They can fail to do so in many ways, such as ignoring or distorting the data or discounting the data because they are considered flawed. At the level of generalization, an individual may accept, for example, that these particular heavy and light objects fell at the same rate but insist that the same rule may not hold for other situations or objects. Finally, even when children appeared to change their beliefs about an observed phenomenon in the immediate context of the experiment, their prior beliefs reemerged later, indicating a lack of long-term retention of the change.

Penner and Klahr (1996a) investigated the extent to which children’s prior beliefs affect their ability to design and interpret experiments. They used a domain in which most children hold a strong belief that heavier objects sink in fluid faster than light objects, and they examined children’s ability to design unconfounded experiments to test that belief. In this study, for objects of a given composition and shape, sink times for heavy and light objects are nearly indistinguishable to an observer. For example, the sink times for the stainless steel spheres weighing 65 gm and 19 gm were .58 sec and .62 sec, respectively. Only one of the eight children (out of 30) who chose to directly contrast these two objects continued to explore the reason for the unexpected finding that the large and small spheres had equivalent sink times. The process of knowledge change was not straightforward. For example, some children suggested that the size of the smaller steel ball offset the fact that it weighed less because it was able to move through the water as fast as the larger, heavier steel ball. Others concluded that both weight and shape make a difference. That is, there was an attempt to reconcile the evidence with prior knowledge and expectations by appealing to causal mechanisms, alternate causes, or enabling conditions.

What is also important to note about the children in the Penner and Klahr study is that they did in fact notice the surprising finding, rather than

ignore or misrepresent the data. They tried to make sense of the outcome by acting as a theorist who conjectures about the causal mechanisms, boundary conditions, or other ad hoc explanations (e.g., shape) to account for the results of an experiment. In Chinn and Malhotra’s (2002) study of students’ evaluation of observed evidence (e.g., watching two objects fall simultaneously), the process of noticing was found to be an important mediator of conceptual change.

Echevarria (2003) examined seventh graders’ reactions to anomalous data in the domain of genetics and whether they served as a catalyst for knowledge construction during the course of self-directed experimentation. Students in the study completed a 3-week unit on genetics that involved genetics simulation software and observing plant growth. In both the software and the plants, students investigated or observed the transmission of one trait. Anomalies in the data were defined as outcomes that were not readily explainable on the basis of the appearance of the parents.

In general, the number of hypotheses generated, the number of tests conducted, and the number of explanations generated were a function of students’ ability to encounter, notice, and take seriously an anomalous finding. The majority of students (80 percent) developed some explanation for the pattern of anomalous data. For those who were unable to generate an explanation, it was suggested that the initial knowledge was insufficient and therefore could not undergo change as a result of the encounter with “anomalous” evidence. Analogous to case studies in the history of science (e.g., Simon, 2001), these students’ ability to notice and explore anomalies was related to their level of domain-specific knowledge (as suggested by Pasteur’s oft quoted maxim “serendipity favors the prepared mind”). Surprising findings were associated with an increase in hypotheses and experiments to test these potential explanations, but without the domain knowledge to “notice,” anomalies could not be exploited.

There is some evidence that, with instruction, students’ ability to evaluate anomalous data improves (Chinn and Malhotra, 2002). In a study of fourth, fifth, and sixth graders, one group of students was instructed to predict the outcomes of three experiments that produce counterintuitive but unambiguous data (e.g., reaction temperature). A second group answered questions that were designed to promote unbiased observations and interpretations by reflecting on the data. A third group was provided with an explanation of what scientists expected to find and why. All students reported their prediction of the outcome, what they observed, and their interpretation of the experiment. They were then tested for generalizations, and a retention test followed 9-10 days later. Fifth and sixth graders performed better than did fourth graders. Students who heard an explanation of what scientists expected to find and why did best. Further analyses suggest that the explanation-based intervention worked by influencing students’ initial

predictions. This correct prediction then influenced what was observed. A correct observation then led to correct interpretations and generalizations, which resulted in conceptual change that was retained. A similar pattern of results was found using interventions employing either full or reduced explanations prior to the evaluation of evidence.

Thus, it appears that children were able to change their beliefs on the basis of anomalous or unexpected evidence, but only when they were capable of making the correct observations. Difficulty in making observations was found to be the main cognitive process responsible for impeding conceptual change (i.e., rather than interpretation, generalization, or retention). Certain interventions, in particular those involving an explanation of what scientists expected to happen and why, were very effective in mediating conceptual change when encountering counterintuitive evidence. With particular scaffolds, children made observations independent of theory, and they changed their beliefs based on observed evidence.

THE IMPORTANCE OF EXPERIENCEAND INSTRUCTION

There is increasing evidence that, as in the case of intellectual skills in general, the development of the component skills of scientific reasoning “cannot be counted on to routinely develop” (Kuhn and Franklin, 2006, p. 47). That is, young children have many requisite skills needed to engage in scientific thinking, but there are also ways in which even adults do not show full proficiency in investigative and inference tasks. Recent research efforts have therefore been focused on how such skills can be promoted by determining which types of educational interventions (e.g., amount of structure, amount of support, emphasis on strategic or metastrategic skills) will contribute most to learning, retention, and transfer, and which types of interventions are best suited to different students. There is a developing picture of what children are capable of with minimal support, and research is moving in the direction of ascertaining what children are capable of, and when, under conditions of practice, instruction, and scaffolding. It may one day be possible to tailor educational opportunities that neither under- or overestimate children’s ability to extract meaningful experiences from inquiry-based science classes.

Very few of the early studies focusing on the development of experimentation and evidence evaluation skills explicitly addressed issues of instruction and experience. Those that did, however, indicated an important role of experience and instruction in supporting scientific thinking. For example, Siegler and Liebert (1975) incorporated instructional manipulations aimed at teaching children about variables and variable levels with or without practice on analogous tasks. In the absence of both instruction and

extended practice, no fifth graders and a small minority of eighth graders were successful. Kuhn and Phelps (1982) reported that, in the absence of explicit instruction, extended practice over several weeks was sufficient for the development and modification of experimentation and inference strategies. Later studies of self-directed experimentation also indicate that frequent engagement with the inquiry environment alone can lead to the development and modification of cognitive strategies (e.g., Kuhn, Schauble, and Garcia-Mila, 1992; Schauble et al., 1991).

Some researchers have suggested that even simple prompts, which are often used in studies of students’ investigation skills, may provide a subtle form of instruction intervention (Klahr and Carver, 1995). Such prompts may cue the strategic requirements of the task, or they may promote explanation or the type of reflection that could induce a metacognitive or metastrategic awareness of task demands. Because of their role in many studies of revealing students’ thinking generation, it may be very difficult to tease apart the relative contributions of practice from the scaffolding provided by researcher prompts.

In the absence of instruction or prompts, students may not routinely ask questions of themselves, such as “What are you going to do next?” “What outcome do you predict?” “What did you learn?” and “How do you know?” Questions such as these may promote self-explanation, which has been shown to enhance understanding in part because it facilitates the integration of newly learned material with existing knowledge (Chi et al., 1994). Questions such as the prompts used by researchers may serve to promote such integration. Chinn and Malhotra (2002) incorporated different kinds of interventions, aimed at promoting conceptual change in response to anomalous experimental evidence. Interventions included practice at making predictions, reflecting on data, and explanation. The explanation-based interventions were most successful at promoting conceptual change, retention, and generalization. The prompts used in some studies of self-directed experimentation are very likely to serve the same function as the prompts used by Chi et al. (1994). Incorporating such prompts in classroom-based inquiry activities could serve as a powerful teaching tool, given that the use of self-explanation in tutoring systems (human and computer interface) has been shown to be quite effective (e.g., Chi, 1996; Hausmann and Chi, 2002).

Studies that compare the effects of different kinds of instruction and practice opportunities have been conducted in the laboratory, with some translation to the classroom. For example, Chen and Klahr (1999) examined the effects of direct and indirect instruction of the control of variables strategy on students’ (grades 2-4) experimentation and knowledge acquisition. The instructional intervention involved didactic teaching of the control-of-variables strategy, along with examples and probes. Indirect (or implicit) training involved the use of systematic probes during the course of children’s

experimentation. A control group did not receive instruction or probes. No group received instruction on domain knowledge for any task used (springs, ramps, sinking objects). For the students who received instruction, use of the control-of-variables strategy increased from 34 percent prior to instruction to 65 percent after, with 61-64 percent usage maintained on transfer tasks that followed after 1 day and again after 7 months, respectively. No such gains were evident for the implicit training or control groups.

Instruction about control of variables improved children’s ability to design informative experiments, which in turn facilitated conceptual change in a number of domains. They were able to design unconfounded experiments, which facilitated valid causal and noncausal inferences, resulting in a change in knowledge about how various multivariable causal systems worked. Significant gains in domain knowledge were evident only for the instruction group. Fourth graders showed better skill retention at long-term assessment than second or third graders.

The positive impact of instruction on control of variables also appears to translate to the classroom (Toth, Klahr, and Chen, 2000; Klahr, Chen and Toth, 2001). Fourth graders who received instruction in the control-of-variables strategy in their classroom increased their use of the strategy, and their domain knowledge improved. The percentage of students who were able to correctly evaluate others’ research increased from 28 to 76 percent.

Instruction also appears to promote longer term use of the control-of-variables strategy and transfer of the strategy to a new task (Klahr and Nigam, 2004). Third and fourth graders who received instruction were more likely to master the control-of-variables strategy than students who explored a multivariable system on their own. Interestingly, although the group that received instruction performed better overall, a quarter of the students who explored the system on their own also mastered the strategy. These results raise questions about the kinds of individual differences that may allow for some students to benefit from the discovery context, but not others. That is, which learner traits are associated with the success of different learning experiences?

Similar effects of experience and instruction have been demonstrated for improving students’ ability to use evidence from multiple records and make correct inferences from noncausal variables (Keselman, 2003). In many cases, students show some improvement when they are given the opportunity for practice, but greater improvement when they receive instruction (Kuhn and Dean, 2005).

Long-term studies of students’ learning in the classroom with instructional support and structured experiences over months and years reveal children’s potential to engage in sophisticated investigations given the appropriate experiences (Metz, 2004; Lehrer and Schauble, 2005). For example, in one classroom-based study, second and fourth and fifth graders took part

in a curriculum unit on animal behavior that emphasized domain knowledge, whole-class collaboration, scaffolded instruction, and discussions about the kinds of questions that can and cannot be answered by observational records (Metz, 2004). Pairs or triads of students then developed a research question, designed an experiment, collected and analyzed data, and presented their findings on a research poster. Such studies have demonstrated that, with appropriate support, students in grades K-8 and students from a variety of socioeconomic, cultural, and linguistic backgrounds can be successful in generating and evaluating scientific evidence and explanations (Kuhn and Dean, 2005; Lehrer and Schauble, 2005; Metz, 2004; Warren, Rosebery, and Conant, 1994).

KNOWLEDGE AND SKILL IN MODELING

The picture that emerges from developmental and cognitive research on scientific thinking is one of a complex intertwining of knowledge of the natural world, general reasoning processes, and an understanding of how scientific knowledge is generated and evaluated. Science and scientific thinking are not only about logical thinking or conducting carefully controlled experiments. Instead, building knowledge in science is a complex process of building and testing models and theories, in which knowledge of the natural world and strategies for generating and evaluating evidence are closely intertwined. Working from this image of science, a few researchers have begun to investigate the development of children’s knowledge and skills in modeling.

The kinds of models that scientists construct vary widely, both within and across disciplines. Nevertheless, the rhetoric and practice of science are governed by efforts to invent, revise, and contest models. By modeling, we refer to the construction and test of representations that serve as analogues to systems in the real world (Lehrer and Schauble, 2006). These representations can be of many forms, including physical models, computer programs, mathematical equations, or propositions. Objects and relations in the model are interpreted as representing theoretically important objects and relations in the represented world. Models are useful in summarizing known features and predicting outcomes—that is, they can become elements of or representations of theories. A key hurdle for students is to understand that models are not copies; they are deliberate simplifications. Error is a component of all models, and the precision required of a model depends on the purpose for its current use.

The forms of thinking required for modeling do not progress very far without explicit instruction and fostering (Lehrer and Schauble, 2000). For this reason, studies of modeling have most often taken place in classrooms over sustained periods of time, often years. These studies provide a pro-

vocative picture of the sophisticated scientific thinking that can be supported in classrooms if students are provided with the right kinds of experiences over extended periods of time. The instructional approaches used in studies of students’ modeling, as well as the approach to curriculum that may be required to support the development of modeling skills over multiple years of schooling, are discussed in the chapters in Part III.

Lehrer and Schauble (2000, 2003, 2006) reported observing characteristic shifts in the understanding of modeling over the span of the elementary school grades, from an early emphasis on literal depictional forms, to representations that are progressively more symbolic and mathematically powerful. Diversity in representational and mathematical resources both accompanied and produced conceptual change. As children developed and used new mathematical means for characterizing growth, they understood biological change in increasingly dynamic ways. For example, once students understood the mathematics of ratio and changing ratios, they began to conceive of growth not as simple linear increase, but as a patterned rate of change. These transitions in conception and representation appeared to support each other, and they opened up new lines of inquiry. Children wondered whether plant growth was like animal growth, and whether the growth of yeast and bacteria on a Petri dish would show a pattern like the growth of a single plant. These forms of conceptual development required a context in which teachers systematically supported a restricted set of central ideas, building successively on earlier concepts over the grades of schooling.

Representational Systems That Support Modeling

The development of specific representational forms and notations, such as graphs, tables, computer programs, and mathematical expressions, is a critical part of engaging in mature forms of modeling. Mathematics, data and scale models, diagrams, and maps are particularly important for supporting science learning in grades K-8.

Mathematics

Mathematics and science are, of course, separate disciplines. Nevertheless, for the past 200 years, the steady press in science has been toward increasing quantification, visualization, and precision (Kline, 1980). Mathematics in all its forms is a symbol system that is fundamental to both expressing and understanding science. Often, expressing an idea mathematically results in noticing new patterns or relationships that otherwise would not be grasped. For example, elementary students studying the growth of organisms (plants, tobacco hornworms, populations of bacteria) noted that when they graphed changes in heights over the life span, all the organisms

studied produced an emergent S-shaped curve. However, such seeing depended on developing a “disciplined perception” (Stevens and Hall, 1998), a firm grounding in a Cartesian system. Moreover, the shape of the curve was determined in light of variation, accounted for by selecting and connecting midpoints of intervals that defined piece-wise linear segments. This way of representing typical growth was contentious, because some midpoints did not correspond to any particular case value. This debate was therefore a pathway toward the idealization and imagined qualities of the world necessary for adopting a modeling stance. The form of the growth curve was eventually tested in other systems, and its replications inspired new questions. For example, why would bacteria populations and plants be describable by the same growth curve? In this case and in others, explanatory models and data models mutually bootstrapped conceptual development (Lehrer and Schauble, 2002).

It is not feasible in this report to summarize the extensive body of research in mathematics education, but one point is especially critical for science education: the need to expand elementary school mathematics beyond arithmetic to include space and geometry, measurement, and data/ uncertainty. The National Council of Teachers of Mathematics standards (2000) has strongly supported this extension of early mathematics, based on their judgment that arithmetic alone does not constitute a sufficient mathematics education. Moreover, if mathematics is to be used as a resource for science, the resource base widens considerably with a broader mathematical base, affording students a greater repertoire for making sense of the natural world.

For example, consider the role of geometry and visualization in comparing crystalline structures or evaluating the relationship between the body weights and body structures of different animals. Measurement is a ubiquitous part of the scientific enterprise, although its subtleties are almost always overlooked. Students are usually taught procedures for measuring but are rarely taught a theory of measure. Educators often overestimate children’s understanding of measurement because measuring tools—like rulers or scales—resolve many of the conceptual challenges of measurement for children, so that they may fail to grasp the idea that measurement entails the iteration of constant units, and that these units can be partitioned. It is reasonably common, for example, for even upper elementary students who seem proficient at measuring lengths with rulers to tacitly hold the theory that measuring merely entails the counting of units between boundaries. If these students are given unconnected units (say, tiles of a constant length) and asked to demonstrate how to measure a length, some of them almost always place the units against the object being measured in such a way that the first and last tile are lined up flush with the end of the object measured. This arrangement often requires leaving spaces between units. Diagnosti-

cally, these spaces do not trouble a student who holds this “boundary-filling” conception of measurement (Lehrer, 2003; McClain et al., 1999).

Data

Researchers agree that scientific thinking entails the coordination of theory with evidence (Klahr and Dunbar, 1988; Kuhn, Amsel, and O’Loughlin, 1988), but there are many ways in which evidence may vary in both form and complexity. Achieving this coordination therefore requires tools for structuring and interpreting data and error. Otherwise, students’ interpretation of evidence cannot be accountable. There have been many studies of students’ reasoning about data, variation, and uncertainty, conducted both by psychologists (Kahneman, Solvic, and Tversky, 1982; Konold, 1989; Nisbett et al., 1983) and by educators (Mokros and Russell, 1995; Pollatsek, Lima, and Well, 1981; Strauss and Bichler, 1988). Particularly pertinent here are studies that focus on data modeling (Lehrer and Romberg, 1996), that is, how reasoning with data is recruited as a way of investigating genuine questions about the world.

Data modeling is, in fact, what professionals do when they reason with data and statistics. It is central to a variety of enterprises, including engineering, medicine, and natural science. Scientific models are generated with acute awareness of their entailments for data, and data are recorded and structured as a way of making progress in articulating a scientific model or adjudicating among rival models. The tight relationship between model and data holds generally in domains in which inquiry is conducted by inscribing, representing, and mathematizing key aspects of the world (Goodwin, 2000; Kline, 1980; Latour, 1990).

Understanding the qualities and meaning of data may be enhanced if students spend as much attention on its generation as on its analysis. First and foremost, students need to grasp the notion that data are constructed to answer questions (Lehrer, Giles, and Schauble, 2002). The National Council of Teachers of Mathematics (2000) emphasizes that the study of data should be firmly anchored in students’ inquiry, so that they “address what is involved in gathering and using the data wisely” (p. 48). Questions motivate the collection of certain types of information and not others, and many aspects of data coding and structuring also depend on the question that motivated their collection. Defining the variables involved in addressing a research question, considering the methods and timing to collect data, and finding efficient ways to record it are all involved in the initial phases of data modeling. Debates about the meaning of an attribute often provoke questions that are more precise.

For example, a group of first graders who wanted to learn which student’s pumpkin was the largest eventually understood that they needed to agree

whether they were interested in the heights of the pumpkins, their circumferences, or their weights (Lehrer et al., 2001). Deciding what to measure is bound up with deciding how to measure. As the students went on to count the seeds in their pumpkins (they were pursuing a question about whether there might be relationship between pumpkin size and number of seeds), they had to make decisions about whether they would include seeds that were not full grown and what criteria would be used to decide whether any particular seed should be considered mature.

Data are inherently a form of abstraction: an event is replaced by a video recording, a sensation of heat is replaced by a pointer reading on a thermometer, and so on. Here again, the tacit complexity of tools may need to be explained. Students often have a fragile grasp of the relationship between the event of interest and the operation (hence, the output) of a tool, whether that tool is a microscope, a pan balance, or a “simple” ruler. Some students, for example, do not initially consider measurement to be a form of comparison and may find a balance a very confusing tool. In their mind, the number displayed on a scale is the weight of the object. If no number is displayed, weight cannot be found.

Once the data are recorded, making sense of them requires that they be structured. At this point, students sometimes discover that their data require further abstraction. For example, as they categorized features of self-portraits drawn by other students, a group of fourth graders realized that it would not be wise to follow their original plan of creating 23 categories of “eye type” for the 25 portraits that they wished to categorize (DiPerna, 2002). Data do not come with an inherent structure; rather, structure must be imposed (Lehrer, Giles, and Schauble, 2002). The only structure for a set of data comes from the inquirers’ prior and developing understanding of the phenomenon under investigation. He imposes structure by selecting categories around which to describe and organize the data.

Students also need to mentally back away from the objects or events under study to attend to the data as objects in their own right, by counting them, manipulating them to discover relationships, and asking new questions of already collected data. Students often believe that new questions can be addressed only with new data; they rarely think of querying existing data sets to explore questions that were not initially conceived when the data were collected (Lehrer and Romberg, 1996).

Finally, data are represented in various ways in order to see or understand general trends. Different kinds of displays highlight certain aspects of the data and hide others. An important educational agenda for students, one that extends over several years, is to come to understand the conventions and properties of different kinds of data displays. We do not review here the extensive literature on students’ understanding of different kinds of representational displays (tables, graphs of various kinds, distributions), but, for

purposes of science, students should not only understand the procedures for generating and reading displays, but they should also be able to critique them and to grasp the communicative advantages and disadvantages of alternative forms for a given purpose (diSessa, 2004; Greeno and Hall, 1997). The structure of the data will affect the interpretation. Data interpretation often entails seeking and confirming relationships in the data, which may be at varying levels of complexity. For example, simple linear relationships are easier to spot than inverse relationships or interactions (Schauble, 1990), and students often fail to entertain the possibility that more than one relationship may be operating.

The desire to interpret data may further inspire the creation of statistics, such as measures of center and spread. These measures are a further step of abstraction beyond the objects and events originally observed. Even primary grade students can learn to consider the overall shape of data displays to make interpretations based on the “clumps” and “holes” in the data. Students often employ multiple criteria when trying to identify a “typical value” for a set of data. Many young students tend to favor the mode and justify their choice on the basis of repetition—if more than one student obtained this value, perhaps it is to be trusted. However, students tend to be less satisfied with modes if they do not appear near the center of the data, and they also shy away from measures of center that do not have several other values clustered near them (“part of a clump”). Understanding the mean requires an understanding of ratio, and if students are merely taught to “average” data in a procedural way without having a well-developed sense of ratio, their performance notoriously tends to degrade into “average stew”—eccentric procedures for adding and dividing things that make no sense (Strauss and Bichler, 1988). With good instruction, middle and upper elementary students can simultaneously consider the center and the spread of the data. Students can also generate various forms of mathematical descriptions of error, especially in contexts of measurement, where they can readily grasp the relationships between their own participation in the act of measuring and the resulting variation in measures (Petrosino, Lehrer, and Schauble, 2003).

Scale Models, Diagrams, and Maps

Although data representations are central to science, they are not, of course, the only representations students need to use and understand. Perhaps the most easily interpretable form of representation widely used in science is scale models. Physical models of this kind are used in science education to make it possible for students to visualize objects or processes that are at a scale that makes their direct perception impossible or, alternatively, that permits them to directly manipulate something that otherwise

they could not handle. The ease or difficulty with which students understand these models depends on the complexity of the relationships being communicated. Even preschoolers can understand scale models used to depict location in a room (DeLoache, 2004). Primary grade students can pretty readily overcome the influence of the appearance of the model to focus on and investigate the way it functions (Penner et al., 1997), but middle school students (and some adults) struggle to work out the positional relationships of the earth, the sun, and the moon, which involves not only reconciling different perspectives with respect to perspective and frame (what one sees standing on the earth, what one would see from a hypothetical point in space), but also visualizing how these perspectives would change over days and months (see, for example, the detailed curricular suggestions at the web site http://www.wcer.wisc.edu/ncisla/muse/).

Frequently, students are expected to read or produce diagrams, often integrating the information from the diagram with information from accompanying text (Hegarty and Just, 1993; Mayer, 1993). The comprehensibility of diagrams seems to be governed less by domain-general principles than by the specifics of the diagram and its viewer. Comprehensibility seems to vary with the complexity of what is portrayed, the particular diagrammatic details and features, and the prior knowledge of the user.

Diagrams can be difficult to understand for a host of reasons. Sometimes the desired information is missing in the first place; sometimes, features of the diagram unwittingly play into an incorrect preconception. For example, it has been suggested that the common student misconception that the earth is closer to the sun in the summer than in the winter may be due in part to the fact that two-dimensional representations of the three-dimensional orbit make it appear as if the foreshortened orbit is indeed closer to the sun at some points than at others.

Mayer (1993) proposes three common reasons why diagrams mis-communicate: some do not include explanatory information (they are illustrative or decorative rather than explanatory), some lack a causal chain, and some fail to map the explanation to a familiar or recognizable context. It is not clear that school students misperceive diagrams in ways that are fundamentally different from the perceptions of adults. There may be some diagrammatic conventions that are less familiar to children, and children may well have less knowledge about the phenomena being portrayed, but there is no reason to expect that adult novices would respond in fundamentally different ways. Although they have been studied for a much briefer period of time, the same is probably true of complex computer displays.

Finally, there is a growing developmental literature on students’ understanding of maps. Maps can be particularly confusing because they preserve some analog qualities of the space being represented (e.g., relative position and distance) but also omit or alter features of the landscape in ways that

require understanding of mapping conventions. Young children often initially confuse maps of the landscape with pictures of objects in the landscape. It is much easier for youngsters to represent objects than to represent large-scale space (which is the absence of or frame for objects). Students also may struggle with orientation, perspective (the traditional bird’s eye view), and mathematical descriptions of space, such as polar coordinate representations (Lehrer and Pritchard, 2002; Liben and Downs, 1993).

CONCLUSIONS

There is a common thread throughout the observations of this chapter that has deep implications for what one expects from children in grades K-8 and how their science learning should be structured. In almost all cases, the studies converge to the position that the skills under study develop with age, but also that this development is significantly enhanced by prior knowledge, experience, and instruction.

One of the continuing themes evident from studies on the development of scientific thinking is that children are far more competent than first suspected, and likewise that adults are less so. Young children experiment, but their experimentation is generally not systematic, and their observations as well as their inferences may be flawed. The progression of ability is seen with age, but it is not uniform, either across individuals or for a given individual. There is variation across individuals at the same age, as well as variation within single individuals in the strategies they use. Any given individual uses a collection of strategies, some more valid than others. Discovering a valid strategy does not mean that an individual, whether a child or an adult, will use the strategy consistently across all contexts. As Schauble (1996, p. 118) noted:

The complex and multifaceted nature of the skills involved in solving these problems, and the variability in performance, even among the adults, suggest that the developmental trajectory of the strategies and processes associated with scientific reasoning is likely to be a very long one, perhaps even lifelong. Previous research has established the existence of both early precursors and competencies … and errors and biases that persist regardless of maturation, training, and expertise.

One aspect of cognition that appears to be particularly important for supporting scientific thinking is awareness of one’s own thinking. Children may be less aware of their own memory limitations and therefore may be unsystematic in recording plans, designs, and outcomes, and they may fail to consult such records. Self-awareness of the cognitive strategies available is also important in order to determine when and why to employ various strategies. Finally, awareness of the status of one’s own knowledge, such as

recognizing the distinctions between theory and evidence, is important for reasoning in the context of scientific investigations. This last aspect of cognition is discussed in detail in the next chapter.

Prior knowledge, particularly beliefs about causality and plausibility, shape the approach to investigations in multiple ways. These beliefs influence which hypotheses are tested, how experiments are designed, and how evidence is evaluated. Characteristics of prior knowledge, such as its type, strength, and relevance, are potential determinants of how new evidence is evaluated and whether anomalies are noticed. Knowledge change occurs as a result of the encounter.

Finally, we conclude that experience and instruction are crucial mediators of the development of a broad range of scientific skills and of the degree of sophistication that children exhibit in applying these skills in new contexts. This means that time spent doing science in appropriately structured instructional frames is a crucial part of science education. It affects not only the level of skills that children develop, but also their ability to think about the quality of evidence and to interpret evidence presented to them. Students need instructional support and practice in order to become better at coordinating their prior theories and the evidence generated in investigations. Instructional support is also critical for developing skills for experimental design, record keeping during investigations, dealing with anomalous data, and modeling.

Carey, S., Evans, R., Honda, M., Jay, E., and Unger, C. (1989). An experiment is when you try it and see if it works: A study of grade 7 students’ understanding of the construction of scientific knowledge. International Journal of Science Education, 11, 514-529.

Keys, C.W. (1994). The development of scientific reasoning skills in conjunction with collaborative writing assignments: An interpretive study of six ninth-grade students. Journal of Research in Science Teaching, 31, 1003-1022.

Lehrer, R. (2003). Developing understanding of measurement. In J. Kilpatrick, W.G. Martin, and D.E. Schifter (Eds.), A research companion to principles and standards for school mathematics (pp. 179-192). Reston, VA: National Council of Teachers of Mathematics.

Linn, M.C. (1980). Teaching students to control variables: Some investigations using free choice experiences. In S. Modgil and C. Modgil (Eds.), Toward a theory ofpsychological development within the Piagettian framework. Windsor Berkshire, England: National Foundation for Educational Research.

Linn, M.C., Chen, B., and Thier, H.S. (1977). Teaching children to control variables: Investigations of a free choice environment. Journal of Research in Science Teaching, 14, 249-255.

McNay, M., and Melville, K.W. (1993). Children’s skill in making predictions and their understanding of what predicting means: A developmental study. Journal ofResearch in Science Teaching, 30, 561-577.

Metz, K.E. (2004). Children’s understanding of scientific inquiry: Their conceptualization of uncertainty in investigations of their own design. Cognition and Instruction,22(2), 219-290.

Mokros, J., and Russell, S. (1995). Children’s concepts of average and representativeness. Journal for Research in Mathematics Education, 26(1), 20-39.

National Council of Teachers of Mathematics. (2000). Principles and standards forschool mathematics. Reston, VA: Author.

Slowiaczek, L.M., Klayman, J., Sherman, S.J., and Skov, R.B. (1992). Information selection and use in hypothesis testing: What is a good question, and what is a good answer. Memory and Cognition, 20(4), 392-405.

Thagard, P. (1998a). Ulcers and bacteria I: Discovery and acceptance. Studies inHistory and Philosophy of Science. Part C: Studies in History and Philosophy ofBiology and Biomedical Sciences, 29, 107-136.

Thagard, P. (1998b). Ulcers and bacteria II: Instruments, experiments, and social interactions. Studies in History and Philosophy of Science. Part C: Studies inHistory and Philosophy of Biology and Biomedical Sciences, 29(2), 317-342.

What is science for a child? How do children learn about science and how to do science? Drawing on a vast array of work from neuroscience to classroom observation, Taking Science to School provides a comprehensive picture of what we know about teaching and learning science from kindergarten through eighth grade. By looking at a broad range of questions, this book provides a basic foundation for guiding science teaching and supporting students in their learning. Taking Science to School answers such questions as:

When do children begin to learn about science? Are there critical stages in a child's development of such scientific concepts as mass or animate objects?

What role does nonschool learning play in children's knowledge of science?

How can science education capitalize on children's natural curiosity?

What are the best tasks for books, lectures, and hands-on learning?

How can teachers be taught to teach science?

The book also provides a detailed examination of how we know what we know about children's learning of science--about the role of research and evidence. This book will be an essential resource for everyone involved in K-8 science education--teachers, principals, boards of education, teacher education providers and accreditors, education researchers, federal education agencies, and state and federal policy makers. It will also be a useful guide for parents and others interested in how children learn.

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.