Watching this exchange is a reminder of something any parent knows: Four-year-olds, no matter how smart and delightful they may be, have obvious limits as test takers. Many, especially boys, can’t sit still for the full duration of an exam; others can’t stay awake or concentrate for that long, choosing at some catastrophic point to crawl under their desks and give up. Nor is the context in which these tests are administered exactly relaxing for young children. Both IQ tests require that they sit alone in a room with a tester they probably haven’t seen before. In the case of the WPPSI, the tester often isn’t allowed to prompt the children to give more complete answers, even if it’s clear they’re capable of delivering them (and would score better if they did). In the case of the OLSAT, the testers can’t even repeat the questions.

“What is a pet?”

“An animal. I have pet goldfish.”

Her tester decides to play along this time. “Do they have names?”

“Zoe and Tangerine.”

Skylar plants her marker next to a rectangular-shaped sticker she’d gotten as a reward for a previous exercise and admires the shape she’s just made. “Look! A flag!”

Stephen J. Bagnato, a professor of pediatrics and psychology at the University of Pittsburgh, is fond of quoting Head Start co-founder Urie Bronfenbrenner, who in 1977 famously wrote, “Much of contemporary developmental psychology is the science of the strange behavior of children in strange situations with strange adults for the briefest possible periods of time.” It’s hard not to think about that observation in the context of intelligence-testing 4-year-olds. The script is so rigid, the tasks are so narrow and precise. Skylar did extremely well on her evaluation. Yet to me, the loveliest and most intellectually revealing moment was when she blew off all rules and made that whimsical little flag. If it were a real exam, the tester wouldn’t even have written it down. “Well, right,” says Bagnato. “When the examiner can only say certain things to these kids, and the child can only say certain things back, of course it’s too confining. We know that the way kids display their skills best is through creative play and everyday interactions at home and at school.”

As it turns out, intelligence tests miss lots of things, not just creativity. And perhaps that explains why IQs alone are not especially good predictors of excellence. In the twenties, for instance, Lewis Terman, a psychologist and deep believer in intelligence testing—it was he who revised Alfred Binet’s original test and came up with the Stanford-Binet model—started a now-famous longitudinal study of nearly 1,500 California children with extremely high IQs. He grandiosely called it “Genetic Studies of Genius,” and his hope was to show that these children, whom he called “exceptionally superior,” would one day form the backbone of the nation’s intellectual and creative elite, making crucial advances in sciences and public policy and the arts. But as David Shenk, author of the forthcoming The Genius in All of Us, points out, his subjects only grew less and less remarkable as time wore on. None won Nobel Prizes, though two who were specifically rejected for the study—William Shockley and Luis Alvarez—did, both in physics. None became world-renowned musicians, though two other rejects—Isaac Stern and Yehudi Menuhin—did, for their virtuosic violin-playing. In Outliers, Malcolm Gladwell makes a similar point, noting that one’s IQ needn’t be super-high to succeed; it simply needs to be high enough. “Once someone has reached an IQ of somewhere around 120,” he writes, “having additional IQ points doesn’t seem to translate into any measurable real-world advantage.” In Genius Revisited, Rena Subotnik, director of the American Psychological Association’s Center for Gifted Education Policy, undertook a similar study, with colleagues, looking at Hunter elementary-school alumni all grown up. Their mean IQs were 157. “They were lovely people,” she says, “and they were generally happy, productive, and satisfied with their lives. But there really wasn’t any wow factor in terms of stellar achievement.”

So what do psychologists and educators think makes the difference between good and exceptional? Opportunity, connections, mentors. Perseverance and monomaniacal devotion, or what the psychologist Ellen Winner calls “the rage to master.” Creativity, a willingness to fail. Nelson, the head of Calhoun, can go on at urgent, passionate length about this.

“I want a school full of kids who daydream,” he says. “I want kids who are occasionally impulsive. I want kids who are fun to be with. I want kids who don’t want to answer the questions on those tests in the way the adult wants them to be answered, because that kid is already seeing the world differently. In fact,” he adds, after thinking it over for a moment, “I want kids who are cynical enough at age 4 to know that there’s really something wrong with someone asking them these things and think, ‘I’m going to screw with them in the process!’ ”

Granted, Calhoun is an unusual school, a place where kids don’t even get test scores until they’re freshmen. But one needn’t be particularly subversive to appreciate Nelson’s philosophy of educating 4-year-olds, or his frustration with current practice. “You have to play with blocks,” he says. “You have to make up stories. You have to muck around. Arithmetic and decoding language aren’t life—they’re symbolic representations of other things. And education is being diverted into focusing on these symbolic representations of the very experiences kids are being denied.”

Nelson says he’s considering scrapping the WPPSI as an admission requirement for Calhoun’s lower school, possibly starting as early as next year. As it is, he barely takes a kid’s score into account. One of the most compelling reasons to get rid of it, he notes, isn’t because the test is intellectually pointless. It’s because it’s emotionally insidious. “When we resort to any kind of measure of kids that’s supposed to be qualitative at a young age,” he says, “no matter how cheerfully we do it, no matter how many lollipops we hand out to de-stress the process, young children are extraordinarily discerning. They absorb their parents’ anxiety about it, they absorb the kinds of judgments people are making about them. So there’s a process of organizing kids in a hierarchy of worth, and it’s beginning at an age that’s criminal.”

The irony is that doing well on these exams can be just as damaging as doing poorly on them. “Gifted” is an awfully uncomfortable label for some children to wear. It can cripple their thinking, make them terrified of risk. “It’s not entirely inaccurate to observe that more and more high-achieving students go off to university and don’t care about anything,” says Nelson. “They don’t ask questions, they don’t have original ideas. And it’s not because there’s anything wrong with them, but because they were conditioned to believe that learning is about giving back the right answer.” Nelson knows it’s heresy to say this, but he wonders if it’s true. “These tests, at 4, start that long process of conditioning,” he says. “Right then, children start to believe that learning means pleasing the powerful adult in whose presence you are.”

It’s unlikely that most city schools will follow Nelson’s lead and stop testing 4-year-olds. But it is possible that these tests could earn less and less weight in the selection process as they become tainted by excessive prepping and anxiety. That doesn’t mean, however, that the selection process will become more democratic. “I’m afraid schools will be judging the child in ways that aren’t any better,” says Emily Glickman, founder of Abacus Guide Educational Consulting. “There’ll just be more weight on the school report, and what the nursery-school director says about the child verbally. And often kids who come from expensive, high-cachet nursery schools have elaborate evaluations written about them, because the preschool directors themselves have a high stake in the class’s placement success.” And in the case of private schools, she notes, even more emphasis may be given to a family’s socioeconomic status: “The kindergarten-admission process has always been about openly judging a 4-year-old and secretly judging the parents’ wealth, connections, and likeliness to give.”

Giving less weight to these tests doesn’t guarantee that the selection process would become more sensible, either, or more sensitive to finding those children who’d profit from an enriched education. After all, what mechanism should schools use?

This is the hardest question. Most education researchers can tell you just what’s wrong with intelligence-testing 4-year-olds. But few can tell you what should emerge in its stead. “Before we adopted the OLSAT,” says the Department of Education’s Commitante, “we had 32 different school districts using a huge … a tremendous variety of assessments.” Some, she says, relied on expensive IQ tests; others required teacher evaluations. The result was a hodgepodge of arbitrary standards—ones that, the city believed, worked against children who spoke English as a second language (the OLSAT is given in eight languages) or had lower incomes (the city gives the OLSAT for free).